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Preface 


This book is the result of more than five years of intensive research in collaboration with a 
large number of people. Since the beginning, our goal has been to understand at a deeper 
level how information-theoretic security ideas can help build more secure networks and 
communication systems. Back in 2008, the actual plan was to finish the manuscript within 
one year, which for some reason seemed a fairly reasonable proposition at that time. 
Needless to say, we were thoroughly mistaken. The pace at which physical-layer security 
topics have found their way into the main journals and conferences in communications 
and information theory is simply staggering. In fact, there is now a vibrant scientific 
community uncovering the benefits of looking at the physical layer from a security point 
of view and producing new results every day. Writing a book on physical-layer security 
thus felt like shooting at not one but multiple moving targets. 

To preserve our sanity we decided to go back to basics and focus on how to bridge the 
gap between theory and practice. It did not take long to realize that the book would have 
to appeal simultaneously to information theorists, cryptographers, and network-security 
specialists. More precisely, the material could and should provide a common ground for 
fruitful interactions between those who speak the language of security and those who for 
avery long time focused mostly on the challenges of communicating over noisy channels. 
Therefore, we opted for a mathematical treatment that addresses the fundamental aspects 
of information-theoretic security, while providing enough background on cryptographic 
protocols to allow an eclectic and synergistic approach to the design of security systems. 

The book is intended for several different groups: (a) communication engineers and 
security specialists who wish to understand the fundamentals of physical-layer security 
and apply them in the development of real-life systems, (b) scientists who aim at creating 
new knowledge in information-theoretic security and applications, (c) graduate students 
who wish to be trained in the fundamental techniques, and (d) decision makers who seek 
to evaluate the potential benefits of physical-layer security. If this book leads to many 
exciting discussions at the white board among diverse groups of people, then our goal 
will have been achieved. 

Finally, we would like to acknowledge all our colleagues, students, and friends who 
encouraged us and supported us during the course of this project. First and foremost, we 
are deeply grateful to Steve McLaughlin, who initiated the project and let us run with 
it. Special thanks are also due to Phil Meyer and Sarah Matthews from Cambridge Uni- 
versity Press for their endless patience as we postponed the delivery of the manuscript 
countless times. We express our sincere gratitude to Demijan Klinc and Alexandre 


xii 


Preface 


Pierrot, who proofread the entire book in detail many times and relentlessly asked 
for clarification, simplification, and consistent notation. We would like to thank Glenn 
Bradford, Michael Dickens, Brian Dunn, Jing Huang, Utsaw Kumar, Ebrahim Molavian- 
Jazi, and Zhanwei Sun for attending EE 87023 at the University of Notre Dame when 
the book was still a set of immature lecture notes. The organization and presentation 
of the book have greatly benefited from their candid comments. Thanks are also due 
to Nick Laneman, who provided invaluable support. Willie Harrison, Xiang He, Mari 
Kobayashi, Ashish Khisti, Francesco Renna, Osvaldo Simeone, Andrew Thangaraj, and 
Aylin Yener offered very constructive comments. The book also benefited greatly from 
many discussions with Prakash Narayan, Imre Csiszar, Muriel Médard, Ralf Koetter, and 
Pedro Pinto, who generously shared their knowledge with us. Insights from research by 
Miguel Rodrigues, Luisa Lima, Joao Paulo Vilela, Paulo Oliveira, Gerhard Maierbacher, 
Tiago Vinhoza, and João Almeida at the University of Porto also helped shape the views 
expressed in this volume. 


Matthieu Bloch, Georgia Institute of Technology 
Joao Barros, University of Porto 
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Preliminaries 


An information-theoretic approach to 
physical-layer security 


A simple look at today’s information and communication infrastructure is sufficient for 
one to appreciate the elegance of the layered networking architecture. As networks flour- 
ish worldwide, the fundamental problems of transmission, routing, resource allocation, 
end-to-end reliability, and congestion control are assigned to different layers of proto- 
cols, each with its own specific tools and network abstractions. However, the conceptual 
beauty of the layered protocol stack is not easily found when we turn our attention to 
the issue of network security. In the early days of the Internet, possibly because network 
access was very limited and tightly controlled, network security was not yet viewed 
as a primary concern for computer users and system administrators. This perception 
changed with the increase in network connections. Technical solutions, such as personnel 
access controls, password protection, and end-to-end encryption, were developed soon 
after. The steady growth in connectivity, fostered by the advent of electronic-commerce 
applications and the ubiquity of wireless communications, remains unhindered and has 
resulted in an unprecedented awareness of the importance of network security in all its 
guises. 

The standard practice of adding authentication and encryption to the existing protocols 
at the various communication layers has led to what could be rightly classified as a 
patchwork of security mechanisms. Given that data security is so critically important, 
it is reasonable to argue that security measures should be implemented at all layers 
where this can be done in a cost-effective manner. Interestingly, one layer has remained 
almost ignored in this shift towards secure communication: the physical layer, which lies 
at the lowest end of the protocol stack and converts bits of information into modulated 
signals. The state of affairs described is all the more striking since randomness, generally 
perceived as a key element of secrecy systems, is abundantly available in the stochastic 
nature of the noise that is intrinsic to the physical communication channel. On account 
of this observation, this book is entirely devoted to an emerging paradigm: security 
technologies that are embedded at the physical layer of the protocol architecture, a 
segment of the system where little security exists today. 

The absence of a comprehensive physical-layer security approach may be partly 
explained by invoking the way security issues are taught. A typical graduate course 
in cryptography and security often starts with a discussion of Shannon’s information- 
theoretic notion of perfect secrecy, but information-theoretic security is quickly dis- 
carded and regarded as no more than a beautiful, yet unfeasible, theoretical con- 
struct. Such an exposition is designed to motivate the use of state-of-the-art encryption 
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Figure 1.1 Shannon’s model of a secrecy system. 


algorithms, which are insensitive to the characteristics of the communication channel 
and rely on mathematical operations assumed to be hard to compute, such as prime 
factorization. 

In this introductory chapter, we approach the subject in a different way. First, we 
give a bird’s-eye view of the basic concepts of information-theoretic security and how 
they differ from classical cryptography. Then, we discuss in general terms some of the 
major achievements of information-theoretic security and give some examples of its 
potential to strengthen the security of the physical layer. The main idea is to exploit the 
randomness of noisy communication channels to guarantee that a malicious eavesdropper 
obtains no information about the sent messages: security is ensured not relative to a hard 
mathematical problem but by the physical uncertainty inherent to the noisy channel. 


Shannon’s perfect secrecy 


Roughly speaking, the objective of secure communication is twofold; upon transmission 
of a message, the intended receivers should recover the message without errors while 
nobody else should acquire any information. This fundamental principle was formalized 
by Shannon in his 1949 paper [1], using the model of a secrecy system illustrated in 
Figure 1.1. A transmitter attempts to send a message M to a legitimate receiver by 
encoding it into a codeword X.' During transmission, the codeword is observed by an 
eavesdropper (called the enemy cryptanalyst in Shannon’s original model) without any 
degradation, which corresponds to a worst-case scenario in which the communication 
channel is error-free. In real systems, where some form of noise is almost always present, 
this theoretical assumption corresponds to the existence of powerful error-correction 
mechanisms, which ensure that the message can be recovered with arbitrarily small 
probability of error. As is customary in cryptography, we often refer to the transmitter 
as “Alice,” to the legitimate receiver as “Bob,” and to the eavesdropper as “Eve.” 

In this worst-case scenario, the legitimate receiver must have some advantage over the 
eavesdropper, otherwise the latter would be able to recover the message M as well. The 
solution to this problem lies in the use of a secret key K, known only to the transmitter 


' Tn cryptography, X is also called a cryptogram or ciphertext. 
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Table 1.1 Example of a one-time pad 


Message M 0 1 0 1 0 0 O 1 1 0 1 
Key K 100 1 1 O 0 0 1 0 1 
Cryptogram X=M@K 1 1 0 0 1 0 0 1 0 0 0 


and the legitimate receiver. The codeword X is then obtained by computing a function 
of the message M and the secret key K. 

Shannon formalized the notion of secrecy by quantifying the average uncertainty of 
the eavesdropper. In information-theoretic terms, messages and codewords are treated 
as random variables, and secrecy is measured in terms of the conditional entropy of 
the message given the codeword, denoted as H(M|X). The quantity H(M|X) is also 
called the eavesdropper’s equivocation; perfect secrecy is achieved if the eavesdropper’s 
equivocation equals the a-priori uncertainty one could have about the message, that is 


H(M|X) = HM). 


This equation implies that the codeword X is statistically independent of the message 
M. The absence of correlation ensures that there exists no algorithm that would allow 
the cryptanalyst to extract information about the message. We will see in Chapter 3 that 
perfect secrecy can be achieved only if H(K) > H(M)); that is, the uncertainty about the 
key must be at least as large as the uncertainty about the message. In other words, we 
must have at least one secret bit for every bit of information contained in the message. 

From an algorithmic perspective, perfect secrecy can be achieved by means ofa simple 
procedure called a one-time pad (or Vernam’s cipher), an example of which is shown in 
Table 1.1 for the case of a binary message and a binary key. The codeword is formed by 
computing the binary addition (XOR) of each message bit with a separate key bit. If the 
key bits are independent and uniformly distributed, it can be shown that the codeword is 
statistically independent of the message. To recover the message, the legitimate receiver 
need only add the codeword and the secret key. On the other hand, the eavesdropper does 
not have access to the key; therefore, from her perspective, every message is equally 
likely and she cannot do better than randomly guessing the message bits. 

Although the one-time pad can achieve perfect secrecy with low complexity, its 
applicability is limited by the following requirements: 


e the legitimate partners must generate and store long keys consisting of random bits; 

e each key can be used only once (otherwise the cryptanalyst has a fair chance of 
discovering the key); 

e the key must be shared over a secure channel. 


To solve the problem of distributing long keys in a secure manner, we could be tempted 
to generate long pseudo-random sequences using a smaller seed. However, information 
theory shows that the uncertainty of the eavesdropper is upper bounded by the number 
of random key bits used. The smaller the key the greater the probability that the eaves- 
dropper will succeed in extracting some information from the codeword. In this case, 
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Figure 1.2 Wyner’s wiretap channel model. 


the only obstacle faced by the eavesdropper is computational complexity, which leads 
directly to the concept of computational security. 

The aforementioned caveats regarding the one-time pad are arguably responsible for 
the skepticism with which security practitioners dismiss the usefulness of information- 
theoretic security. We shall now see that a closer look at the underlying communications 
model may actually yield the solution towards wider applicability. 


Secure communication over noisy channels 


As mentioned before, random noise is an intrinsic element of almost all physical com- 
munication channels. In an effort to understand the role of noise in the context of secure 
communications, Wyner introduced the wiretap channel model illustrated in Figure 1.2. 
The main differences between this approach and Shannon’s original secrecy system are 
that 


e the legitimate transmitter encodes a message M into a codeword X” consisting of n 
symbols, which is sent over a noisy channel to the legitimate receiver; 

e the eavesdropper observes a noisy version, denoted by Z”, of the signal Y” available 
at the receiver. 


In addition, Wyner suggested a new definition for the secrecy condition. Instead of 
requiring the eavesdropper’s equivocation to be exactly equal to the entropy of the 
message, we now ask for the equivocation rate (1/n)H(M|Z") to be arbitrarily close 
to the entropy rate of the message (1/n)H(M) for sufficiently large codeword length n. 
With this relaxed security constraint, it can be shown that there exist channel codes that 
asymptotically guarantee both an arbitrarily small probability of error at the intended 
receiver and secrecy. Such codes are colloquially known as wiretap codes. The maximum 
transmission rate that is achievable under these premises is called the secrecy capacity, 
and can be shown to be strictly positive whenever the eavesdropper’s observation Z” is 
“noisier” than Y”. 

In the seventies and eighties, the impact of Wyner’s results was limited due to several 
important obstacles. First, practical code constructions for the wiretap channel were not 
available. Second, the wiretap channel model restricts the eavesdropper by assuming that 
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Figure 1.3 Communication over a binary erasure wiretap channel. 


she suffers from more noise than is experienced by the legitimate receiver. In addition, 
soon after the notion of secrecy capacity appeared, information-theoretic security was 
overshadowed by Diffie and Hellman’s seminal work on public-key cryptography, which 
relies on mathematical functions believed hard to compute and has dominated security 
research since then. 


Channel coding for secrecy 


Although the previous results on the secrecy capacity prove the existence of codes 
capable of guaranteeing reliable communication while satisfying a secrecy condition, 
it is not immediately clear how such codes can be constructed in practice. Consider 
the channel model illustrated in Figure 1.3, in which Alice wants to send one bit of 
information to Bob over an error-free channel while knowing that Eve’s channel is a 
binary erasure channel, which erases an input symbol with probability e€. If Alice sends 
an uncoded bit, then Eve is able to obtain it correctly with probability 1 — €, leading to an 
equivocation equal to e€. It follows that, unless € = 1, Eve is able to obtain a non-trivial 
amount of information. 

Alternatively, Alice could use an encoder that assigns one or more codewords to each 
of the two possible messages, 0 and 1. Suppose she takes all the binary sequences of 
length n and maps them in such a way that those with even parity correspond to M = 0 
and those with odd parity are assigned to M = 1. If Bob receives one of these codewords 
over the error-free channel, he can obtain the correct message value by determining the 
parity of the received codeword. Eve, on the other hand, is left with an average of ne 
erasures. As soon as one or more bits are erased, Eve loses her ability to estimate the parity 
of the binary sequence transmitted. This event happens with probability 1 — (1 — €)” 
and it can be shown that 


H(MI|Z”") > 1- (1 — 6)", 


which goes to unity as n tends to infinity. In others words, for sufficiently large codeword 
length, we get an equivocation that is arbitrarily close to the entropy of a message. The 
drawback of this coding scheme is that the transmission rate of 1/n goes to zero 
asymptotically with n as well. Alice and Bob can communicate securely by assigning 
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Figure 1.4 Secret-key agreement from correlated observations. 


multiple codewords to the same message, but the secrecy achieved bears a price in terms 
of rate. 

The intuition developed for the binary erasure wiretap channel should carry over to 
more general models. If Bob’s channel induces fewer errors than Eve’s, Bob should still 
be able to recover messages using a channel code; in contrast, Eve should be left with 
a list of possible codewords and messages. Asymptotic perfect secrecy can be achieved 
if this list covers the entire set of messages and their probability given that the received 
noisy codeword is roughly uniform. Unfortunately, to this day, practical wiretap code 
constructions are known for only a few specific channels. 


Secret-key agreement from noisy observations 


If Alice and Bob are willing to settle for generating a secret key instead of communicating 
a secret message straight away, then they can use the noisy channel to generate correlated 
random sequences and subsequently use an error-free communication channel to agree 
on a secret key. Such a situation is illustrated in Figure 1.4, in which Alice, Bob, and 
Eve obtain correlated observations X”, Y”, and Z”, respectively; Alice and Bob then 
generate a key K on the basis of their respective observations and a set of messages 
F exchanged over the error-free channel. In the early nineties, Maurer and Ahlswede 
and Csiszar showed that, even if messages F are made available to the eavesdropper, 
Alice and Bob can generate a key oblivious to the eavesdropper such that H(K|Z” F) is 
arbitrarily close to H(K). Provided that authentication is in place, granting Eve access 
rights to all feedback messages does not compromise security. 

To gain some intuition for why public feedback is useful, consider an instance of 
Wyner’s setup, in which the main channel and the eavesdropper’s channel are binary 
symmetric channels. When Alice transmits a random symbol X over the main channel, 
Bob obtains Y = X @ D and Eve observes Z = X @ E, where D and E are Bernoulli 
random variables that correspond to the noise added by the main channel and the 
eavesdropper’s channel, respectively. Assume further that Bob’s channel is noisier than 
Eve’s, in the sense that P[D = 1] > P[E = 1]. Bob now uses the feedback channel in 
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the following manner. To send a symbol V, he adds the noisy observation received from 
the channel and sends V ® Y = V @ X @ D over the public channel. Since Alice knows 
X, she can perform a simple binary addition in order to obtain V @ D. Eve, on the other 
hand, has only a noisy observation Z, and it can be shown that her optimal estimation 
of Vis VY Z= V @ D Q E. Thus, Alice and Bob effectively transform a wiretap 
scenario that is advantageous to Eve into a channel in which she suffers from more errors 
than do Alice and Bob. 

From a practical perspective, the design of key-agreement schemes from correlated 
observations turns out to be a simpler problem than the construction of codes for the 
wiretap channel. In fact, a wiretap code needs to guarantee simultaneously reliable 
communication to the legitimate receiver and security against the eavesdropper. On 
the other hand, since a key does not carry any information in itself, the reliability and 
security constraints can be handled separately. For instance, Alice would first send error- 
correction data to Bob, in the form of parity bits, which would allow him to revert the 
bit flips caused by the noise in the channel. Even if the error-correcting bits are sent over 
the public channel, the fact that Eve’s observation contains more errors than Bob’s is 
sufficient to guarantee that she is unable to arrive at the same sequence as Alice and Bob. 
Alice and Bob would then use a well-chosen hash function to transform their common 
sequence of symbols into a much shorter key and, because of her errors, Eve is unable 
to predict the output of the hash. Finally, the key would be used as a one-time pad to 
ensure information-theoretic security. 


Active attacks 


Thus far, we have assumed that Eve is a passive eavesdropper, who wishes to extract as 
much information as possible from the signals traversing the channel. However, if she 
can afford the risk of being detected by the legitimate partners, she has a wide range 
of active attacks at her disposal. Eve could impersonate Alice or Bob to cause further 
confusion, intercept and forge messages that are sent over the noisy channels and the 
error-free public channel, or simply send jamming signals to perturb the communication. 

Sender authentication is a tacit assumption in most contributions in the area of 
information-theoretic security. Except in special and rare instances of the wiretap sce- 
nario, a shared secret in the form of a small key is necessary to authenticate the first 
transmissions. Subsequent messages can be authenticated using new keys that can be 
generated at the physical layer using some of the methods in this book. Alternatively, 
if Alice and Bob are communicating over a wireless channel, then they can sometimes 
exploit the reciprocity of the channel to their advantage. The receiver can associate a cer- 
tain channel impulse response with a certain transmitter and it is practically impossible 
for an adversary located at a different position to be able to generate a similar impulse 
response at the receiver. 

With authentication in place, it is impossible for the attacker to impersonate the legit- 
imate partners and to forge messages. However, the attacker may decide to obstruct the 
communication by means ofjamming. This can be done in a blind manner by transmitting 
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noise, or in a more elaborate fashion exploiting all the available information on the 
codes and the signals used by the legitimate partners. It is worth pointing out that the use 
of jamming is not restricted to the active attackers. Cooperative jamming techniques, 
by which one or more legitimate transmitters send coded jamming signals to increase 
the confusion of the attacker, can be used effectively to increase the secrecy capacity in 
multi-user channels. Sophisticated signal processing, most notably through the use of 
multiple antennas, can also further enhance the aforementioned security benefits. 


Physical-layer security and classical cryptography 


There are many fundamental differences between the classical cryptographic primi- 
tives used at higher layers of the protocol stack and physical-layer security based on 
information-theoretic principles. It is therefore important to understand what these dif- 
ferences are and how they affect the choice of technology in a practical scenario. 

Classical computational security uses public-key cryptography for authentication and 
secret-key distribution and symmetric encryption for the protection of transmitted data. 
The combination of state-of-the-art algorithms like RSA and the Advanced Encryption 
Standard (AES) is deemed secure for a large number of applications because so far no 
efficient attacks on public-key systems are publicly known. Many symmetric ciphers 
were broken in the past, but those that were compromised were consistently replaced by 
new algorithms, whose cryptanalysis is more difficult and requires more computational 
effort. Under the assumption that the attacker cannot break hard cryptographic primitives, 
it is possible to design systems that are secure with probability one. The technology is 
readily available and inexpensive. 

However, there are also disadvantages to the computational model. The security of 
public-key cryptography is based on the conjecture that certain one-way functions are 
hard to invert, which remains unproven from a mathematical point of view. Computing 
power continues to increase at a very fast pace, such that brute-force attacks that were 
once deemed unfeasible are now within reach. Moreover, there are no precise metrics to 
compare the strengths of different ciphers in a rigorous way. In general, the security of a 
cryptographic protocol is measured by whether it survives a set of attacks or not. From 
the works of Shannon and Wyner, one concludes that the ruling cryptographic paradigm 
can never provide information-theoretic security, because the communication channel 
between the friendly parties and the eavesdropper is noiseless and the secrecy capacity 
is zero. Moreover, existing key-distribution schemes based on the computational model 
require a trusted third party as well as complex protocols and system architectures. If 
multiple keys are to be generated, it is usually possible to do so only from a single shared 
secret and at the price of reduced data protection. 

The main advantages of physical-layer security under the information-theoretic secu- 
rity model come from the facts that no computational restrictions are placed on the 
eavesdropper and that very precise statements can be made about the information that is 
leaked to the eavesdropper as a function of the channel quality. Physical-layer security 
has already been realized in practice through quantum key distribution and, in theory, 
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suitably long codes can come exponentially close to perfect secrecy. The system archi- 
tecture for security is basically the same as the one for communication. Instead of 
distributing keys, it is possible to generate on-the-fly as many secret keys as desired. 

However, we must accept some disadvantages as well. First and foremost, infor- 
mation-theoretic security relies on average information measures. The system can be 
designed and tuned for a specific level of security, claiming for instance that with very 
high probability a block will be secure; however, it might not be possible to guarantee 
confidentiality with probability one. We are also forced to make assumptions about the 
communication channels that might not be accurate in practice. In most cases, one would 
make very conservative assumptions about the channels, which is likely to result in low 
secrecy capacities or low secret-key or message exchange rates. A few systems have 
been deployed, most notably for optical communication, but the technology is not very 
widely available and is still expensive. 

In light of the brief comparisons above, it is likely that any deployment of a physical- 
layer security protocol in a classical system would be part of a layered security solution 
whereby confidentiality and authentication are provided at a number of different layers, 
each with a specific goal in mind. This modular approach is how virtually all systems 
are designed, so, in this context, physical-layer security provides an additional layer of 
security that does not yet exist in communication networks. 


Outline of the rest of the book 


The main objective of this book is to lay out the theoretical foundations of physical-layer 
security and to provide practical tools for implementing it in real systems. The different 
chapters cover essential theory and mathematical models for assessing physical-layer 
security and characterizing its fundamental limits, coding schemes for data security at 
the physical layer, and system aspects of physical-layer security. 

Chapter 2 summarizes fundamental notions of information theory required in order 
to understand subsequent chapters. Our presentation emphasizes the mathematical tools 
and notions of particular relevance to physical-layer security. 

Chapter 3 introduces the seminal results regarding secrecy capacity for communica- 
tion channels, highlighting the mathematical techniques used in the derivations. 

Chapter 4 focuses on the fundamental limits and methodologies of secret-key agree- 
ment, including the reconciliation of correlated sequences and how privacy amplification 
allows strong secrecy. 

Chapter 5 discusses the fundamental limits of secure communication over Gaussian 
and wireless channels. 

Chapter 6 covers some of the techniques used to achieve physical-layer security in 
practice, including the design of codes for wiretap channels as well as the construction 
of codes for secret-key agreement. 

Chapter 7 addresses system issues related to the integration of physical-layer secu- 
rity in contemporary communications architectures and gives examples of practical 
applications. 
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Figure 1.5 Dependences between chapters. 


Chapter 8 discusses physical-layer security in multi-user systems and shows how the 
secrecy rates of multi-terminal networks can be increased through the appropriate use 
of feedback, cooperation, and jamming. 

Chapter 9 deals with network-coding security. Although it is not necessarily imple- 
mented at the physical layer, network coding combines aspects of information and coding 
theory with close connections to information-theoretic security. By allowing interme- 
diate nodes to mix different information flows through non-trivial operations, network 
coding offers a number of security challenges and opportunities. 

The dependences between the chapters of the book are illustrated in Figure 1.5. The 
reader familiar with the tools and techniques of information theory can probably skip 
Chapter 2 and start at Chapter 3. The fundamental concepts and results of information- 
theoretic security are presented in Chapter 3 and Chapter 4 and are leveraged in Chapter 5 
and Chapter 8 to study specific applications. Chapter 6, Chapter 7, and Chapter 9 rely 
on the notions introduced in earlier chapters but can be read independently. 


2.1 


Fundamentals of information theory 


We begin with a brief overview of some of the fundamental concepts and mathematical 
tools of information theory. This allows us to establish notation and to set the stage for 
the results presented in subsequent chapters. For a comprehensive introduction to the 
fundamental concepts and methods of information theory, we refer the interested reader 
to the textbooks of Gallager [2], Cover and Thomas [3], Yeung [4], and Csiszar and 
Körner [5]. 

The rest of the chapter is organized as follows. Section 2.1 provides an overview 
of the basic mathematical tools and metrics that are relevant for subsequent chapters. 
Section 2.2 illustrates the fundamental proof techniques used in information theory by 
discussing the point-to-point communication problem and Shannon’s coding theorems. 
Section 2.3 is entirely devoted to network information theory, with a special emphasis on 
distributed source coding and multi-user communications as they relate to information- 
theoretic security. 


Mathematical tools of information theory 


The following subsections describe a powerful set of metrics and tools that are useful 
to characterize the fundamental limits of communication systems. All results are stated 
without proof through a series of lemmas and theorems, and we refer the reader to 
standard textbooks [2, 3, 4] for details. Unless specified otherwise, all random variables 
and random vectors used throughout this book are real-valued random vectors. 


Useful bounds 


We start by recalling a few inequalities that are useful to bound the probabilities of rare 
events. 


Lemma 2.1 (Markov’s inequality). Let X be a non-negative real-valued random vari- 
able. Then, 


aX 
Va >0 PIX > a) < EPS. 
a 
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The following consequence of Markov’s inequality is particularly useful. 


Lemma 2.2 (Selection lemma). Let X, € X, be a random variable and let F be a finite 
set of functions f : X, — R* such that |F| does not depend on n and 


Vf eF Ex,[f(X%n)] < dn). 
Then, there exists a specific realization x» of Xn such that 


Yf EF f(Xn) <ln). 


Proof. Let €, = 6(n). Using the union bound and Markov’s inequality, we obtain 


Px, U {f(Xn) 2 (F| + Den} < SO Px LAX) > (FI + Den] 
SEF JEF 
g Ix, LA (Xn)] 
2 IF] + 1)en 
IF] 
` |FiL+1 
<l. 


Therefore, there exists at least one realization x, of X, such that 
YVfEF f&n) < (F| + Den. 


Since €, = (n) and |F| is finite and independent of n, we can write (|F| + 1)en 
as ô(n). 


We call Lemma 2.2 the “selection lemma” because it tells us that, if Ex, [f (X,)] < 
ô(n) for all f € F, we can select a specific realization x, such that f(x,) < ô(n) for 
all f € F. Two other useful consequences of Markov’s inequality are Chebyshev’s 
inequality and Chernov bounds. 


Lemma 2.3 (Chebyshev’s inequality). Let X be a real-valued random variable. Then, 


Var(X) 
a ` 


Va >0 P[|X—Ex[X]| 2 a] < 


Lemma 2.4 (Chernov bounds). Let X be a real-valued random variable. Then, for all 
a> 0, 


Vs>0 P[X > a] <Ex[e“]e™, 


Ys <0 PIX <a] < Ex [e] e. 


Entropy and mutual information 


In this section, we define a series of useful information-theoretic quantities, whose 
operational significance will become clear in the next section. 
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Definition 2.1. Let X and X' be two discrete random variables defined on the same 
alphabet X. The variational distance between X and X' is 


v(x, X’) £ X px) = pxl. 
xEX 


Definition 2.2. Let X € X be a discrete random variable with distribution px. The 
Shannon entropy (or entropy for short) of X is defined as 


H(X) £ — $ px(x) log px(x), 
xEX 


with the convention that 0 log 0 £ 0. Unless specified otherwise, all logarithms are taken 
to the base two and the unit for entropy is called a bit. 


If ¥ = {0, 1}, then X is a binary random variable and its entropy depends solely on 
the parameter p = P[X = 0]. The binary entropy function is defined as 


Hy(p) = —p log p — (1 — p)log(1 — p). 
Lemma 2.5. For any discrete random variable X € X 
0 < H(X) < log|4. 


The equality H(X) = 0 holds if and only if X is a constant while the equality H(X) = 
log|¥| holds if and only if X is uniform on &. 


Conceptually, H(X) can be viewed as a measure of the average amount of information 
contained in X or, equivalently, the amount of uncertainty that subsists until the outcome 
of X is revealed. 


Proposition 2.1 (Csiszár and Körner). Let X and X' be two discrete random variables 
defined on the same alphabet X. Then, 


/ 4 x| 
[H — H(X’) | < V(X, X’)log E 5): 


Definition 2.3. Let X € X and Y € Yy be two discrete random variables with joint 
distribution pxy. The joint entropy of X and Y is defined as 


H(XY) ê -X` X pxy(x, y)log pxy(x, y). 


xEX yey 


Definition 2.4. Let X € X and Y € Y be two discrete random variables with joint 
distribution pxy. The conditional entropy of X given Y is defined as 


H(Y|X) = -X0 $ pxy(x, y)log pyix(v Ix). 


xeEX yey 


By expanding H(XY) with Bayes’ rule, one can verify that 
(XY) = H(X) + H(Y|X). 


Fundamentals of information theory 


This expansion generalizes to the entropy of a random vector X” = (X;,..., Xn) as 


H(X”) = H(X1) + H(X21X1) + «+» + H(X,1X"7!) 
= XC H(xIx'"!), 
i=1 


with the convention that H(X; IX?) £ H(X;). This expansion is known as the chain rule 
of entropy. 


Lemma 2.6 (“Conditioning does not increase entropy”). Let X € X and Y € Y be two 
discrete random variables with joint distribution pxy. Then, 


H(X|Y) < H(X). 


In other words, this lemma asserts that knowledge of Y cannot increase our uncertainty 
about X. 


Definition 2.5. Let X € X and Y € Yy be two discrete random variables with joint 
distribution pxy. The mutual information between X and Y is defined as 


(X; Y) ê H(X) — HXIY). 


Let Xe X, YeyY, and Z € Z be discrete random variables with joint distribution 
pxyz. The conditional mutual information between X and Y given Z is 


IX; YIZ) & H(X|Z) — H(XIYZ). 


Intuitively, I(X; Y) represents the uncertainty about X that is not resolved by the obser- 
vation of Y. By using the chain rule of entropy, one can expand the mutual information 
between a random vector X” = (X1, .. ., Xn) and a random variable Y as 


1X"; Y) = SO 1X; YIX), 
i=1 


with the convention that 1(X; H YIX?) £ I(X;; Y). This expansion is known as the chain 
rule of mutual information. 


Lemma 2.7. Let X € X and Y € Y be two discrete random variables with joint distri- 
bution pxy. Then, 


0 < IX; Y) < min(H(X), HY). 

The equality I(X; Y) = 0 holds if and only if X and Y are independent. The equality 
I(X; Y) = H(X) (I(X; Y) = H(Y)) holds if and only if X is a function of Y (Y is a function 
of X). 


Lemma 2.8. Let X € X, Y € V, and Z € Z be three discrete random variables with 
joint distribution pxyz. Then, 


0 < K(X; YIZ) < min(ŒH(X|Z), H(Y|Z)). 
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The equality 1(X; Y|Z) = 0 holds if and only if X and Y are conditionally independent 
given Z. In this case, we say that X > Z —> Y forms a Markov chain. The equality 
I(X; Y|Z) = H(X| Z) A(X; Y|Z) = H(Y|Z)) holds if and only if X is a function of Y and 
Z (Y is a function of X and Z). 


Lemma 2.9 (Data-processing inequality). Let Xe X, Y € V, and Z € Z be three 
discrete random variables such that X — Y — Z forms a Markov chain. Then, 


I(X;Y) > (X; Z). 


An equivalent form of the data-processing inequality is H(X|Y) < H(X|Z), which 
means that, on average, processing Y can only increase our uncertainty about X. 


Lemma 2.10 (Fano’s inequality). Let X € X be a discrete random variable and let X' 
be any estimate of X that takes values in the same alphabet X. Let P, = P[X # X'] be 
the probability of error obtained when estimating X with X'. Then, 


H(XIX') < Hy(Pe) + Pe log(¥] — 1), 
where Hp(P-) is the binary entropy function defined earlier. 


Fano’s inequality is the key ingredient of many proofs in this book because it relates 
an information-theoretic quantity (the conditional entropy H(X|X’)) to an operational 
quantity (the probability of error P). In what follows, we often write Fano’s inequality 
in the form H(X|X’) < 6(P.) to emphasize that H(X|X’) goes to zero if P, goes to zero. 


Definition 2.6. A function f :Z— R defined on a set T is convex on T if, for all 
(x1, x2) € T? and for all à € [0, 1], 


f(Ax +A — A)x2) < Af @1) + A — AS 2). 


If the equation above holds with strict inequality, f is strictly convex on T. A function 
f :T — R defined on a set T is (strictly) concave on T if the function — f is (strictly) 
convex on T. 


Lemma 2.11 (Jensen’s inequality). LetX € X be a random variable and let f : X > R 
be a convex function. Then, 


SLf] > f(E[X]). 


If f is strictly convex, then equality holds if and only if X is a constant. 


Lemma 2.12. Let X€ X and Y € Y be two discrete random variables with joint 
distribution pxy. Then, 


e H(X) is a concave function of px; 
e I(X;Y) is a concave function of px for pyx fixed; 
o I(X;Y) is a convex function of pyx for px fixed. 


If X is a continuous random variable then, in general, H(X) is not well defined. It is 
convenient to define the differential entropy as follows. 
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Definition 2.7. Let X € X be a continuous random variable with distribution px. The 
differential entropy of X is defined as 


hoy 2 / _ Px(s)log px (x). 


The notions of joint entropy, conditional entropy, and mutual information for continu- 
ous random variables are identical to their discrete counterparts, but with the differential 
entropy h in place of the entropy H. With the exception of Lemma 2.5 and Fano’s 
inequality, all of the properties of entropy and mutual information stated above hold also 
for continuous random variables. In addition, the following properties will be useful in 
the next chapters. 


Lemma 2.13. If X € X is a continuous random variable with variance Var(X) < o?, 
then 


h(X) < 5 log(2re0?), 


with equality if and only if X has a Gaussian distribution with variance o°. 


Lemma 2.14 (Entropy—power inequality). Let X € ¥ and Y € Y be independent con- 
tinuous random variables with entropy h(X) and h(Y), respectively. Let X' and Y' be 
independent Gaussian random variables such that h(X') = h(X) and h(Y’) = h(Y). 
Then, 


h(X+Y)Sh(xX'+Y’), 
or, equivalently, 


22h(X+Y) > 221X") Ae pany’) 


Strongly typical sequences 


Let x” € X” be a sequence whose n elements are in a finite alphabet X. The number 
of occurrences of a symbol a € ¥ in the sequence x” is denoted by N (a; x”), and the 
empirical distribution (or histogram) of x” is defined as the set {N(a;x")/n: a € X}. 


Definition 2.8 (Strong typical set). Let px be a distribution on a finite alphabet X and 
let € > 0. A sequence x” € X” is (strongly) €-typical with respect to px if 


1 
Yace X |-N(a;x”)— px(a)| < € px(a). 
n 


The set of all €-typical sequences with respect to px is called the strong typical set and 
is denoted by T? (X). 


In other words, the typical set T? (X) contains all sequences x” whose empirical 
distribution is “close” to px. The notion of typicality is particularly useful in infor- 
mation theory because of a result known as the asymptotic equipartition property 
(AEP). 
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Theorem 2.1 (AEP). Let px be a distribution on a finite alphabet X and let 0 < € < 
minyex px(x). Let X” be a sequence of independent and identically distributed (i.i.d.) 
random variables with distribution px. Then, 


1 —8.(n) < P[X" € TOO] < 1, 
(=a TON p E, 
Yx” E€ TË (X) 27r ŒEX)+8(6)) < Dx (x”) < 27X86). 


In simple terms, the AEP states that, for sufficiently large n, the probability that the 
realization x” of a sequence of 1.i.d. random variables belongs to the typical set is close 
to unity. Moreover, for practical purposes we may assume that the probability of any 
strongly typical sequence is about 2~-”™) and the number of strongly typical sequences 
is approximately 2”, In some sense, the AEP provides an operational interpretation 
of entropy. 


Remark 2.1. Itis possible to provide explicit expressions for e(n) and (€) [6], but the 
rough characterization used in Theorem 2.1 is sufficient for our purposes. In particular, 
it makes it easier to keep track of small terms that depend on € or n because we can write 
equations such as 5(€) + 6(€) = 6(€) without worrying about the exact dependence on €. 
The notion of typicality generalizes to multiple random variables. Assume that 
(x", y") € X” x Y" is a pair of sequences with elements in finite alphabets ¥ and 
YV. The number of occurrences of a pair (a, b) € X x Y in the pair of sequences (x”, y”) 
is denoted by N(a, b; x”, y”). 
Definition 2.9 (Jointly typical set). Let pxy be a joint distribution on the finite alphabets 
X x Vand let e > 0. Sequences x" € X” and y” € Y" are €-jointly typical with respect 
to pxy if 


1 
Va, bye XxY |-N(a, b; x", y")— pxy(a, b)| < € pxy(@, b). 
n 
The set of all €-jointly typical sequences with respect to pxy is called the jointly typical 
set and is denoted by T? (XY). 


One can check that Z” (XY) € TP? (X) x 7” (Y). In other words, (x”, y”) € T? (XY) 
implies that x” € 7” (X) and y” € 7? (Y). This property is known as the consistency 
of joint typicality. Notice that the jointly typical set 7” (XY) is the typical set 7?” (Z) 
for the random variable Z = (X, Y). Therefore, the result below follows directly from 
Theorem 2.1. 


Corollary 2.1 (Joint AEP). Let pxy be a joint distribution on the finite alphabets 
X x Y and let 0 < € < ming yyexxy pxy(x, y). Let (X", Y”) be a sequence of i.i.d. 
random variables with joint distribution pxy. Then, 


1 — e(n) < P[(X", Y") € T(XY)] <1, 
(he s(n) ECOD < |Z." (XY)| < Qn} +5) 


Vx", y”) € T? (XY) 27n ŒXY)+8(6)) < pxv(x", y”) < Q-MXY)—5(€)) | 
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It is also useful to introduce a conditional typical set, for which we can establish a 
conditional version of the AEP. 


Definition 2.10. Let pxy be a joint distribution on the finite alphabets X x Y and let 
e > 0. Let x” € T,"(X). The set 


TEXY |x") Ê {y" € Y" : (x", y") € TEXY} 
is called the conditional typical set with respect to x". 


Theorem 2.2 (Conditional AEP). Let pxy be a joint distribution on the finite alphabets 
X x Y and suppose that 0 < €' < € S Ming yyexxy Pxv(X, y). Let x” € T? (X) and let 
Y" be a sequence of random variables such that 


vy" EY" pO”) = | | pvixilx). 


i=1 
Then, 
1— 8e(n) < P[Y" € T"(XY|x")] < 1, 
i= Seer(n) 2" 8) < |Z" (X¥|x")| < 2r EYIX)+S(E)) 


yy” € TE (XY|x”) 27r ŒYIX)+8(6)) < Pyrxe(y" |x") < Qn CAY1X)—8(€)) 


The conditional AEP means that, if x” is a typical sequence and if Y” is distributed 
according to JJ}; py|x(vilx:), then Y” is jointly typical with x” with high probability 
for n large enough. In addition, the number of sequences y” that are jointly typical with 
n is approximately 2”#(I%), and their probability is on the order of 2 "#9, The 


following corollary of the conditional AEP will be useful. 


x 


Corollary 2.2. Let pxy be a joint distribution on the finite alphabets X x Y and 
let 0 < € < uxy with uxy 4 MING, yyexxy Pxy(x, y). Let Ý” bea sequence of i.i.d. 
random variables with distribution py. Then, 


e if x? e T? (X), 
a _ s(n) 2 TVO) < ry? € Te (XY |x")] < 27X8). 


e if X" is a sequence of random variables independent of Y" and with arbitrary distri- 
bution py, on &", 


P [(X", Y") € T."(XY)| < 2nd Y)—5(e)) 


In other words, if Y” is generated independently of x”, the probability that Ý” is 
jointly typical with x” is small and on the order of 2~-™"). Corollary 2.2 generalizes to 
more than two random variables; in particular, we make extensive use of the following 
result. 


Corollary 2.3. Let puxy be a joint distribution on the finite alphabets U x X x Y and 
suppose 0 < € < uuxy with Luxy £ Minu, x, yyeUxxxy puxylu, x, y). Let (U”, X”) be 
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a sequence of random variables with arbitrary distribution Pings on U” x X”. Let y" 
be a sequence of random variables conditionally independent of X" given U" such that 


Vu", x", y”) E€ U” x X” x Y” 


n 
Ptrxoyn(u", x", y”) = (11 pratt Prnxn(u", x"). 


i=l 


Then, 
Pin grýn ù”, X", yj € T? (UXY)| < 27r IYIU)—=8(6)) 


Weakly typical sequences 


Strong typicality requires the relative frequency of each possible symbol to be close to 
the corresponding probability; however, the notion of strong typicality does not apply 
to continuous random variables and it is sometimes convenient to use a weaker notion 
of typicality, which merely requires the empirical entropy of a sequence to be close to 
the true entropy of the corresponding random variable. All definitions and results in this 
section are stated for discrete random variables but hold also for continuous random 
variables on replacing the entropy H with the differential entropy h. 


Definition 2.11 (Weakly typical set). Let px be a distribution on a finite alphabet X 
and let € > 0. A sequence x" € X” is (weakly) €-typical with respect to px if 


1 
—— log pxn(x") — H(X)| < €. 
n 
The set of all weakly €-typical sequences with respect to px is called the weakly typical 
set and is denoted A®(X). 
The weak version of the AEP then follows from the weak law of large numbers. 


Theorem 2.3 (AEP). Let px be a distribution on a finite alphabet X and let € > 0. Let 
X" be a sequence of i.i.d. random variables with distribution px. Then, 

e for n sufficiently large, P|X" € A®(X)] > 1 — €; 

e if x" e A" (X), then 27nH(X)+e) < px x”) < 2-H) —€) . 

e for n sufficiently large, (1 — €)2"®09-9 < | AZ| < OnE 

Definition 2.12 (Jointly weak typical set). Let pxy be a joint distribution on the finite 


alphabets X x Y and let € > 0. Sequences x" € X” and y” € Y” are jointly (weakly) 
€-typical with respect to pxy if 


1 
-: log peva", y") — HXY)| < €, 


l 
-4 log pxn(x") — H(X)| < €, 
n 


1 
|- $ 08 proto") -H0 <e. 
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The set of all jointly weakly €-typical sequences with respect to pxy is called the jointly 
weakly typical set and is denoted A? (XY). 


Theorem 2.4 (joint AEP). Let pxy be a joint distribution on the finite alphabets X x Y 
and lete > 0. Let (X", Y”) be a sequence ofi.i.d. random variables with joint distribution 
pxy. Then, 


e forn sufficiently large, P [(x”, Y"JeE Al(XY)] >l-e; 
© if x", y") © A!(XY), then 2-"FCM+T9 < pynyn(x", y”) L 2-MAOM—8) ; 
e for n sufficiently large, (1 — €)" E-989 < | A2(XY)| = Dare). 


With weak typicality, there is no exact counterpart to the conditional AEP given in 
Corollary 2.2 but the following result holds nevertheless. 


Theorem 2.5. Let pxy be a joint distribution on the finite alphabets X x Y and let 
€ > 0. Let Y" be a sequence of i.i.d. random variables with distribution py, and let X” 
be an independent sequence of i.i.d. random variables with distribution px. Then, 


P(X", F") € AN(XY)] < 27TA, 


In subsequent chapters, we use the term AEP for both strong and weak typicality; 
however, it will be clear from the context whether we refer to the theorems of Section 2.1.3 
or those of Section 2.1.4. 


Markov chains and functional dependence graphs 


The identification of Markov chains among random variables that depend on each other 
via complicated relations is a recurrent problem in information theory. In principle, 
Markov chains can be identified by manipulating the joint probability distribution of 
random variables, but this is often a tedious task. In this short section, we describe a 
graphical yet correct method for identifying Markov chains that is based on the functional 
dependence graph of random variables. 


Definition 2.13 (Functional dependence graph). Consider m independent random vari- 
ables and n functions of these variables. A functional dependence graph is a directed 
graph having m + n vertices, and in which edges are drawn from one vertex to another 
if the random variable of the former vertex is an argument in the function defining the 
latter. 


Example 2.1. Let M € M and Z” € R” be independent random variables. Let { f; }n be 
a set of functions from M to R”. Fori € |1, n| define the random variables X; = f;(M) 
and Y; = X; + Zi. The functional dependence graph of the random variables M, X”, Y”, 
and Z” is shown in Figure 2.1. 


Definition 2.14 (d-separation). Let X, Y, and Z be disjoint subsets of vertices in a 
functional dependence graph G. The subset Z is said to d-separate X from Y if there 
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Zi 


Yı Yi Yn 


Figure 2.1 Functional dependence graph of variables in Example 2.1. For clarity, the independent 
random variables are indicated by filled circles (e) whereas the functions of these random variables 
are indicated by empty circles (o). 


exists no path between a vertex of X and a vertex of Y after the following operations 
have been performed: 


e construct the subgraph G' consisting of all vertices in X, Y, and Z, as well as the edges 
and vertices encountered when moving backward starting from any of the vertices in 
X,Y, or Z; 

e in the subgraph G', delete all edges coming out of Z; 

e remove all arrows in G' to obtain an undirected graph. 


The usefulness of d-separation is justified by the following theorem. 


Theorem 2.6. Let X, Y, and Z be disjoint subsets of the vertices in a functional 
dependence graph. If Z d-separates X from y, and if we collect the random variables in 
X, Y, and Z in the random vectors X, Y, and Z, respectively, then X —> Z — Y forms 
a Markov chain. 


Theorem 2.6 is particularly useful in the converse proofs of channel coding theorems. 


Example 2.2. On the basis of the functional dependence graph of Figure 2.1, one can 
check that, for any i Æ j, Xi > Xj > Yj. 


The point-to-point communication problem 


The foundations of information theory were laid by Claude E. Shannon in his 1948 
paper “A mathematical theory of communication” [7]. In his own words, the funda- 
mental problem of communication is that of reproducing at one point either exactly or 
approximately a message selected at another point. If the message — for example, a letter 
from the alphabet, the gray level of a pixel or some physical quantity measured by a 
sensor — is to be reproduced at a remote location with a certain fidelity, some amount of 
information must be transmitted over a physical channel. This observation is the basis 
of Shannon’s general model for point-to-point communication reproduced in Figure 2.2. 
It consists of the following elements. 
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Figure 2.2 Shannon’s communication model (from [7]). 
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Figure 2.3 Mathematical model of a two-stage communication system. 


e The information source generates messages according to some random process. 

e The transmitter observes the messages and forms a signal to be sent over the 
channel. 

e The channel is governed by a noise source, which corrupts the original input signal; 
this models the physical constraints of a communication system, for instance thermal 
noise in electronic circuits or multipath fading in a wireless medium. 

e The receiver takes the received signal, forms a reconstructed version of the original 
message, and delivers the result to the destination. 


Given the statistical properties of the information source and the noisy channel, the 
goal of the communication engineer is to design the transmitter and the receiver in a 
way that allows the information sent by the source to reach its destination in a reliable 
way. Information theory can help us achieve this goal by characterizing the fundamental 
mechanisms behind communication systems and providing us with precise mathematical 
conditions under which reliable communication is possible. 


Point-to-point communication model 


To give a precise formulation of the point-to-point communication problem, we require 
definitions for each of its constituent modules. We assume that the source and the channel 
are described by discrete-time random processes, and we determine that the receiver and 
the transmitter agree on a common code, specified by an encoder and decoder pair. 
As illustrated in Figure 2.3, we consider a two-stage system in which the source is 
compressed before being encoded for channel transmission, and channel outputs are 
decoded before being decompressed. The basic relationships among the components in 
Figure 2.3 are described in the following lines. 
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Definition 2.15. A discrete memoryless source (DMS) (U, pu) generates a sequence 
of i.i.d. symbols (or letters) from the finite alphabet U according to the probability 
distribution pu. The random variable representing a source symbol is denoted by U. 


Definition 2.16. A discrete memoryless channel (DMC) (x , PYIX, y) is described by 
a finite input alphabet X, a finite output alphabet Y, and a conditional probability 
distribution py\x, such that X and Y denote the channel input and the channel output, 
respectively. The set of conditional probabilities (also called transition probabilities) 
can be represented by a channel transition probability matrix (pyix(|x)) xy’ 


In what follows, we illustrate many results numerically with the following DMCs. 


Example 2.3. A binary symmetric channel with cross-over probability p € [0, 1], 
denoted by BSC(p), is a DMC ({0, 1}, py\x, {0, 1}) characterized by the transition 


( ) 


Example 2.4. A binary erasure channel with erasure probability € € [0, 1], denoted by 
BEC(e), is a DMC ({0, 1}, pyx, {0, ?, 1}) characterized by the transition probability 


matrix 
l—e e€ 0 
0 ée l=ej}j` 


Definition 2.17. A (2%? , k) source code C; for a DMS (U, pu) consists of 


e a message set M = 1, ya | 

e an encoding function e : U* — M, which maps a sequence of k source symbols u* 
to a message m; 

e a decoding function d : M — U* U {2}, which maps a message m to a sequence of 
source symbols ù% € UE or an error message ?. 


The compression rate of the source code is defined as (1/k)log[2**] in bits! per 
source symbol, and its probability of error is 


PCy) © P[UF A UF Cy]. 


Definition 2.18. A rate R is an achievable compression rate for the source (U, pu) if 
there exists a sequence of (2'2. k) source codes {Ck}x>1, such that 


lim P.(C,) = 0, 
k= 


that is, the source sequences can be reconstructed with arbitrarily small probability of 
error with compression rates arbitrarily close to R. 


' Unless specified otherwise, all logarithms are taken to the base two. 
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Definition 2.19. 4 (2"*, n) channel code C, for a DMC (X, pyx, Y) consists of 


e a message set M = [1, ls 

e an encoding function f : M —> X”, which maps a message m to a codeword x” with 
n symbols; 

e a decoding function g : Y” > M U {2}, which maps a block of n channel outputs y” 
to a message m € M or an error message ?. 


The set of codewords {f(m):m € l, Peal is called the codebook of C,,. With a 
slight abuse of notation, we denote the codebook itself by C, as well. Unless specified 
otherwise, messages are represented by a random variable M uniformly distributed in 
M, and the rate of the channel code is defined as (1/n) log[2”*] in bits per channel use. 
The average probability of error is defined as 


RC») 2 P/M A~MIG|. 


Definition 2.20. 4 rate R is an achievable transmission rate for the DMC (x, PY|x; y) 
if there exists a sequence of (2"*, n) codes {Cy}n>1 Such that 


lim P.(C,) = 0; 
now 


that is, messages can be transmitted at a rate arbitrarily close to R and decoded with 
arbitrarily small probability of error. The channel capacity of the DMC is defined as 


C = sup{R : R is an achievable transmission rate}. 


The typical goal of information theory is to characterize achievable rates on the basis 
of information-theoretic quantities that depend only on the given probability distribu- 
tions and not on the block lengths k or n. A theorem that confirms the existence of codes 
for a class of achievable rates is often referred to as a direct result and the arguments that 
lead to this result constitute an achievability proof. On the other hand, when a theorem 
asserts that codes with certain properties do not exist, we speak of a converse result 
and a converse proof. A fundamental result that includes both the achievability and the 
converse parts is called a coding theorem. The mathematical tools that enable this charac- 
terization are those presented in Section 2.1, and we illustrate their use by discussing two 
of Shannon’s fundamental coding theorems. These results form the basis of information 
theory and are of great use in several of the proofs developed in subsequent chapters. 


Remark 2.2. Notice that the formulation of the point-to-point communication prob- 
lem does not put any constraints either on the computational complexity or on the 
delay of the encoding and decoding procedures. In other words, the goal is to describe 
the fundamental limits of communications systems irrespective of their technological 
limitations. 


The source coding theorem 


The source coding theorem gives a complete solution (achievability and converse) for 
the point-to-point communication problem stated in Section 2.2.1 when the channel 
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is noiseless, that is Y = X. In that case, it is not necessary to use a channel code to 
compensate for the impairments caused by the channel, but it is still useful to encode 
the messages produced by the source to achieve a more efficient representation of the 
source information in bits per source symbol. This procedure is called source coding or 
data compression. The main idea is to consider only a subset A of all possible source 
sequences U/*, and assign a different index i € |1, |A|] to each of the sequences u* € A. 
If the source produces a sequence u* € A, then the encoder outputs the corresponding 
index 7, otherwise it outputs some predefined constant. The decoder receives the index 
i and outputs the corresponding sequence in A. 

Since information theory is primarily concerned with the fundamental limits of reliable 
communication, it is possible to prove the existence of codes without having to search for 
explicit code constructions. One technique, which is particularly useful in information- 
theoretic problems related to source coding, consists of throwing sequences u” € U” 
randomly into a finite set of bins, such that the sequences that land in the same bin 
share a common bin index. If each sequence is assigned a bin at random according to a 
uniform distribution, then we refer to this procedure as random binning. If we want to 
prove that there exists a code such that the error probability goes to zero, it suffices to 
show that the average of the probability of error taken over all possible bin assignments 
goes to zero and to use the selection lemma. The following theorem exploits random 
binning to characterize the set of achievable compression rates. 


Theorem 2.7 (Source coding theorem). For a discrete memoryless source (U, pu), 
inf{R : R is an achievable compression rate} = H(U). 


In other words, if a compression rate R satisfies R > H(U) then R is achievable and 
any achievable compression rate R must satisfy R > H(U). 


Proof. We start with the achievability part of the proof, which is based on random 
binning. The idea is to randomly assign each source sequence to one of a finite number 
of bins; then, as long as the number of bins is larger than 2‘, the probability of 
finding more than one typical sequence in the same bin is very small. If each typical 
sequence is mapped to a different bin index, an arbitrarily small probability of error can 
be achieved by letting the decoder output the typical sequence that corresponds to the 
received index. Formally, let € > 0 and k € N*. Let R > 0 be a rate to be specified later. 
We construct a (2*8, k) source code C as follows. 


e Binning. For each sequence u* € T#(U), draw an index uniformly at random in the 
set [£ 2*2] . The index assignment defines the encoding function 


e:U* —> [1,2], 


which is revealed to the encoder and decoder. 

e Encoder. Given an observation u*, output m = e(u*) ifu* € T#(U); otherwise output 
m= 1. 

e Decoder. Given message m, output Ù% if it is the unique sequence such that ù* € 7;*(U) 
and e(ù*) = m; otherwise output an error ?. 
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The random variable that represents the randomly generated encoding function e is 
denoted by E while the random variable that represents the randomly generated code 
Cx is denoted by C}. We proceed to bound E[P.(C;)]. First, note that E[P.(C;)] can be 
expressed in terms of the events 


E = {U4 ¢ TÉU), 
€, = {Bù 4 uF : E@) = ECU) and ù* € TEW) 


as E[P,(C,)] = P[Eo U £1]. By the union bound, 
=[P.(Cy)] < P[Eo] + PIE]. (2.1) 

By the AEP, 
PLE0] < S<(K) (2.2) 


and we can upper bound P[€,] as 


PLE] = X pur(u*) P[a0* + u" : EQ) = E(u) and af € TA(U)| 


<Sopu(u') So PLE@ = EW] 


uk akeT&(U) 
ak Auk 
k 1 
= > Pur (u ) 2 [2ER] 
uk akeT&(U) 
ak Auk 


1 
<J pu (u‘) Tok] [TEW] 


1 € 
< Y pu (u*) _ KEUSE) 


< QkCHICU)+5(6)—R) | 
Hence, if we choose R > Hi(U) + d(€), we have 


P[E\] < ôe(k). (2.3) 


On substituting (2.2) and (2.3) into (2.1), we obtain E[P.(C;)] < 6.(k). By applying 
the selection lemma to the random variable C; and the function P., we conclude that 
there exists at least one source code Cx such that P.(C,) < 6.(k). Since € can be chosen 
arbitrarily small, all rates R > H(U) are achievable. 

We now establish the converse result and show that any achievable rate must satisfy 
R > H(U). Let R be an achievable rate and let € > 0. By definition, there exists a source 
code C+ such that P.(C;) < 6(€). If we let M denote the message output by the encoder, 
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then Fano’s inequality guarantees that 
1 
gH(UIMC:) < 5(Pe(Ck)) < 5(€). 


We drop the conditioning on Cx in subsequent calculations to simplify the notation. Note 
that 


HU) = —H(u*) 
1 
(Ut; M) + zH(U*IM) 


(U*; M) + d(€) 


H(M) + 8(€) 


Tle Tle Fe Fle 


< R + 5(k) + 8(€). 


Since € can be chosen arbitrarily small and k can be chosen arbitrarily large, we obtain 
R > HU). 


Remark 2.3. Alternatively, the achievability part of the source coding theorem can be 
established on the basis of the AEP alone. In fact, for large k the AEP guarantees that 
any sequence u* produced by the source (U, pu) belongs with high probability to the 
typical set TE(U); hence, we need only index the approximately 2") typical sequences 
to achieve arbitrarily small probability of error and the corresponding rate is on the 
order of H(U). 


The channel coding theorem 


The channel coding theorem gives a complete solution (achievability and converse) 
for the point-to-point communication problem stated in Section 2.2.1 when the source 
(U, pu) is uniform over U. According to the source coding theorem, there is no need 
to encode the source since H(U) = log|//| is maximal. We simply group the source 
symbols in sequences of length k. Letting M £ ||" and M = [1, M], we index each 
sequence of length k with an integer m € M. We use a channel code of rate (1/n)log M 
to transmit the messages produced by source U over a discrete memoryless channel 
(X, pyx X). 

As was done for the source coding theorem, it is possible to prove the existence of 
codes without having to search for explicit code constructions. The idea is to construct a 
random code by drawing the symbols of codewords independently at random according 
to a fixed probability distribution px on X. Then, if we want to prove that there exists a 
code such that the error probability goes to zero for n sufficiently large, it suffices to show 
that the average of the probability of error taken over all possible random codebooks 
goes to zero for n sufficiently large and use the selection lemma. This technique is 
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referred to as random coding and is used in the proof of the following theorem to 
characterize the set of achievable rates. 


Theorem 2.8 (Channel coding theorem). The capacity of a DMC (x , PYIX; y) is 
C = max,,, I(X; Y). In other words, if R < C then R is an achievable transmission rate 
and an achievable transmission rate must satisfy R < C. 


Proof. We begin with the achievability part based on random coding. We choose a 
probability distribution px on æ and, without loss of generality, we assume that px is 
such that I(X; Y) > 0. Let 0 < € < xy, where uxy = Ming yyexxy Pxy(x, y), and let 
n € N*. Let R > 0 be a rate to be specified later. We construct a (2”*, n) code C, as 
follows. 


e Codebook construction. Construct a codebook with [2”*] codewords, labeled x” (m) 
with m € |1, 2”*]], by generating the symbols x;(m) fori € [1, n] and m € [1, 2”*] 
independently according to px. The codebook is revealed both to the encoder and to 
the decoder. 

e Encoder f. Given m, transmit x” (m). 

e Decoder g. Given y”, output m if it is the unique message such that (x"(m), y”) € 
T! (XY); otherwise, output an error ?. 


The random variable that represents the randomly generated codebook C,, is denoted by 
C,,. We first develop an upper bound for E[P.(C,,)]. Notice that 


E[P.(C,.)] = 


for 


| 
F 


II 
W 


i, [P [M ZM|M=m, Ca || pm). 

By virtue of the symmetry of the random code construction, we have that 
ic, |P[M A M| M = m, C,]] is independent of m. Therefore, we can assume without 
losing generality that message m = 1 was sent and write 


S[P:(Cn)] = Ec, [P[M ZM|M=1, ||. 


Notice that E[P.(C,,)] can be expressed in terms of the events 


E = {(X"(i), Y”) € TEXY} fori e [1,2"%] 


as E[P(C,)] = P [EF U U; £|. By the union bound, 
[P.(C,)] < P [EF] + XC PIE. (2.4) 
iAl 
By the AEP, 
P [EF] < ên). (2.5) 


Since Y” is the output of the channel when X”(1) is transmitted and since X”(1) is 
independent of X” (i) for i # 1, note that Y” is independent of X” (i) for i # 1; hence, 
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Corollary 2.2 applies and 

P[E] < 2-7-5) fori £1. (2.6) 
On substituting (2.5) and (2.6) into (2.4), we obtain 
LIR(Ca)] < de(n) + ZATON, 


Hence, if we choose the rate R such that R < I(X; Y) — 6(e), then 
E[Pe(Cn)] < ôe(n). 


By applying the selection lemma to the random variable C, and the function P,, we 
conclude that there exists a (2”£, n) code C, such that P.(C,,) < ôe(n). Since € can be 
chosen arbitrarily small and since the distribution px is arbitrary, we conclude that all 
rates R < max,, I(X; Y) are achievable. 

We now establish the converse part of the proof. Let R be an achievable rate and let 
€ > 0. For n sufficiently large, there exists a (2”?, n) code C, such that 


1 
-H(MIC,) > R and P.(C,) < 8(€). 
n 


In the remaining part of the proof we drop the conditioning on C, to simplify the notation. 
By virtue of Fano’s inequality, it also holds that 


“H(MIY") < 5(R(Gy)) = 86). 


Therefore, 
1 
R < -H(M) 

n 
1 1 

< —I(M; Y”) + —H(M|Y") 
n n 
1 

< —I(M; Y”) + 6(e) 
n 

(a) 1 

< —1(X"; Y”) + 6(€) 
n 
1 n 1 n n 

= -H(Y”) — —H(Y"|X") + 8(6) 
n n 

o l Š in _ l 

Z= H(Y;|Y — —H(Y;|X; ô 
DALA ) TMK) +5 


i=1 


1< 1 
<-> >, (xv) - Tax) + 8(€) 


1 n 
= np DTW + 806 


< max I(X; Y) + ô(€), 
PX 
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where (a) follows from the data-processing inequality applied to the Markov chain 
M — X” — Y”, (b) follows because the channel is memoryless, and (c) follows because 
conditioning does not increase entropy. Since € can be chosen arbitrarily small, we obtain 
R < maxp,, I(X; Y). 


The channel coding theorem shows that the channel capacity is equal to the maximum 
mutual information between the channel input X and the channel output Y, where the 
maximization is carried out over all possible input probability distributions px. The 
proof technique and structure are common to most proofs in subsequent chapters. 


Example 2.5. The capacity of a binary symmetric channel BSC(p) is 1 — H(p). The 
capacity of a binary erasure channel BEC(e) is 1 — e€. 


Among the many channel models, the additive white Gaussian noise (AWGN) channel 
(Gaussian channel for short) takes a particularly prominent role in information and 
communication theory, because it captures the impact of thermal noise and interference 
on wired and wireless communications. The channel output at each time i > 1 is given 
by Y; = X; + N;, where X; denotes the transmitted symbol and {N,};>1 are i.i.d. random 
variables with distribution M (0, 07). Since the channel capacity of the Gaussian channel 
can be infinite without further restrictions, we add an average power constraint in the 
form of 


-X E[X?] <P. 


Theorem 2.9. The capacity of a Gaussian channel is given by 


1 P 
C=, ee I+- ; 


where P denotes the power constraint and o? is the variance of the noise. 


Sketch of proof. The proof developed for Theorem 2.8 does not apply directly to the 
Gaussian channel because of the power constraint imposed on channel inputs and the 
continuous nature of the channel. Nevertheless, it is possible to develop a similar proof 
by using weakly typical sequences and the weak AEP (see for instance [3, Chapter 9]). 
The power constraint can be dealt with by introducing an error event that accounts for 
the sequences violating the power constraint in the codebook generation. 


Network information theory 


Shannon’s coding theorems characterize the fundamental limits of communication 
between two users. However, in many communication scenarios — for example, satel- 
lite broadcasting, cellular telephony, the Internet, and wireless sensor networks — the 
information is sent by one or more transmitting nodes to one or more receiving nodes 
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Figure 2.4 Joint encoding of correlated sources. 


over more or less intricate communication networks. The interactions between the users 
of said networks introduce a whole new range of fundamental communication aspects 
that are not present in the classical point-to-point problem, such as interference, user 
cooperation, and feedback. The central goal of network information theory is to provide 
a thorough understanding of these basic mechanisms, by characterizing the fundamental 
limits of communication systems with multiple users. In this section, we discuss some 
results of network information theory that are useful for understanding information- 
theoretic security in subsequent chapters. 


Distributed source coding 


Consider a DMS (UV, puv) that consists of two components U and V with joint 
distribution puv. As shown in Figure 2.4, the two components are to be processed by 
a joint encoder and transmitted to a common destination over two noiseless channels. 
The joint distribution puv can be arbitrary and the symbols produced by U and V at 
any given point in time are statistically dependent; therefore, we refer to U and V as 
correlated sources. Since the channels to the destination do not introduce any errors, we 
may ask the following question: at what rates Rı and Rz can we transmit information 
generated by U and V with an arbitrarily small probability of error? Since there is a 
common encoder and a common decoder, this problem reduces to the classical point-to- 
point problem and the solution follows naturally from Shannon’s source coding theorem: 
the messages can be reconstructed with an arbitrarily small probability of error at the 
receiver if and only if 


Ri + Rp > H(UV); 


that is, the sum rate must be greater than the joint entropy of U and V. 

As illustrated in Figure 2.5, the problem becomes more challenging if instead of a 
joint encoder we consider two separate encoders. Here, each encoder observes only the 
realizations of the one source it is assigned to and does not know the output symbols of 
the other source. In this case, it is not immediately clear which encoding rates guarantee 
reconstruction with an arbitrarily small probability of error at the receiver. If we encode U 
at rate Rı > H(U) and V at rate R? > H(V), then the source coding theorem guarantees 
once again that an arbitrarily small probability of error is possible. But, in this case, the 
sum rate satisfies R; + R2 > H(U) + H(V), which, in general, is greater than the joint 
entropy H(UV). 
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Figure 2.5 Separate encoding of correlated sources (the Slepian—Wolf problem). 


Surprisingly, it turns out that the sum rate required by two separate encoders is 
the same as that required by a joint encoder, that is R; + R2 > H(UV) is sufficient 
to reconstruct U and V with an arbitrarily small probability of error. In other words, 
there is no penalty in overall compression rate due to the fact that the encoders can 
observe only the realizations of the one source they have been assigned to. However, it 
is important to point out that the decoder does require a minimum amount of rate from 
each encoder; specifically, the average remaining uncertainty about the messages of one 
source given the messages of the other source, H(U|V) and H(V|U). Formally, a code 
for the distributed source coding problem is defined as follows. 


Definition 2.21. A (2*®: , 2'*®, k) source code Cy for the DMS (UY, puv) consists of 


e two message sets M, = [1, 2**'] and M, = [1,2**]; 

e an encoding function e; : U* — Mı, which maps a sequence of k source symbols u* 
to a message mı; 

e an encoding function e, : VF + My, which maps a sequence of k source symbols v* 
to a message M2; 

e a decoding function d : Mı x Mz > (Uk x V*)U {2}, which maps a message pair 
(mı, m2) to a pair of source sequences (ii*, 0%) € U* x VÝ or an error message ?. 


The performance of a code C; is measured in terms of the average probability of error 
PAC) P(E) Z UN VOI G. 


Definition 2.22. A rate pair (Ri, R2) is achievable if there exists a sequence of 
(Qe , DER | k) codes {Cx}x>1 such that 


lim P.(C,) = 0. 
k-> oo 
The achievable rate region is defined as 
R™ £ cl({(Ri, R2) : (Ri, R2) is achievable}) . 


The achievable rate region with separate encoding was first characterized by Slepian 
and Wolf; hence, the region is often called the Slepian—Wolf region and codes for the 
distributed source coding problem are often referred to as Slepian—Wolf codes. 
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Theorem 2.10 (Slepian—Wolf theorem). The achievable rate region with separate 
encoding for a source (UV, puv) is 


Rı > H(U|V) 
RWS ¢ (Ri, Ro): Ro > H(VIU) 
Ri + Ry > H(UV) 


Proof. We begin with the achievability part of the proof, which is based on joint typicality 
and random binning. Let € > 0 and k € N*. Let R; > 0 and R, > 0 be rates to be 
specified later. We construct a (2*8: , 2*®2 k) code Cz as follows. 


e Binning. For each sequence u* € 7Ž(U), draw an index uniformly at random in the 
set |1, 2'*']. For each sequence v* € T‘(V), draw an index uniformly at random in 
the set [i 2°, The index assignments define the encoding functions 


e,:U* > [1,28] and e: VE > [1 2**], 


which are revealed to all parties. 

Encoder 1. Given the observation u*, if u% € TU), output mı = e\(u*); otherwise 

output mı = 1. 

Encoder 2. Given the observation v‘, if v* € T£(V), output mz = e2(v"); otherwise 

output m, = 1. 

e Decoder. Given messages m; and m3, output û* and Ô* if they are the unique sequences 
such that (û*, 6") € TK(UV) and e, (ûù*) = mı, e2(6*) = mp; otherwise, output ?. 


The random variables that represent the randomly generated functions e; and ez are 
denoted by E; and Ej, and the random variable that represents the randomly generated 
code Cz is denoted by C4. We proceed to bound E[P.(C;,)], which can be expressed in 
terms of the following events: 


E = (U,V) ¢ TA(UV)}, 

E; = (3a* 4 uF: E (Q*) = E (U4) and (û*, V4) € TF(UV)}, 

E = {30 A V* : E2(6*) = E2(V*) and (û*, V4) € TFCUV)}, 

En = (30° Æ Va AU EV) = EU), Eô) = ECV) 
and (a*, o*) € T(UV)}, 


since E[P.(C;)] = P[Eo U E1 U E2 U E12]. By the union bound, 


[RCC] < PlEo] + PIE] + PIE] + PIE]. (2.7) 


By the AEP, 


P[Eo] < 5-(K). (2.8) 
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Using Theorem 2.2, 


P[E,] = 5 purve (u*, v*) P [30* + u" : EQ") = E(u") and (a*, vt) € TA(UV)] 


uk uk 


< So puv (ut, t) SO PRGS = Ew] 


uk vk tkeT(UV|v*) 
ûk+uk 
1 
= gk yk 
Lmh E 
uk uk tkeTF(UV|v*) 
as 


Dp purve (u*, v" ) as |ZA(UV|v*)| 


uk uk 
1 

<` Pueve( uk uk 
) pam 


uk vk 


< EUV- (2.9) 


——— EUV) 


Similarly, we obtain the following bounds for P[E] and P[E,2]: 


P[E] < QHEVIU)+5(€)— Ra) (2.10) 
P[E.2] < Qk(H(UV)+5(€)—(Ri+ Ro) | (2.11) 


Hence, if we choose the rates R, and R2 such that 
R, > H(U|V) + ô(€), 
Ry > H(VIU) + 5(6), 
Ri + Ro > H(UV) + de), 


and substitute (2.8)-(2.11) into (2.7), we obtain E[P.(C;)] < ôe(k). By applying the 
selection lemma to the random variable C, and the function P., we conclude that there 
exists a specific code C, such that P.(C;) < 6-(k). Since € can be chosen arbitrarily small, 
we conclude that 


R, > H(U|V) 
(Ri, Ro): R2 > H(VIU) CRY, 
Ri + R > H(UV) 


The converse part of the proof follows from the converse of the source coding theorem 
and is omitted. 


Figure 2.6 illustrates the typical shape of the Slepian—Wolf region R®™. A special case 
of the Slepian—Wolf problem is when one of the components of the DMS (UV, puv), 
say V, is directly available at the decoder as side information and only U should be 
compressed. This problem is known as source coding with side information. The char- 
acterization of the minimum compression rate required to reconstruct U reliably at the 
decoder follows from Theorem 2.10. 


2.3.2 
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Figure 2.6 The Slepian—Wolf region fora DMS (UV, puv). 


Corollary 2.4 (Source coding with side information). Consider a DMS (UY, puv) and 
assume that (U, pu) should be compressed knowing that (V, pv) is available as side 
information at the decoder. Then, 


inf{R : R is an achievable compression rate} = H(U|V). 


Corollary 2.4 plays a fundamental role for secret-key agreement in Chapters 4 and 6. 


The multiple-access channel 


In the previous problem, we assumed that the information generated by multiple sources 
is transmitted over noiseless channels. If these data are to be communicated over a 
common noisy channel to a single destination, we call this type of channel a multiple- 
access channel (MAC). As illustrated in Figure 2.7, a discrete memoryless multiple 
access channel (£, A2, PYIXiX2> y) consists of two finite input alphabets X; and 12, 
one finite output alphabet V, and transition probabilities py|x,x, such that 


Vn >1 V(xt,x5,y") E€ XI x XF x Y” 


n 
Pyoxexe (y"lxtx3) = | [ pixxx Gilani, x2). 


i=1 


Definition 2.23. 4 (2”®: , 2”? , n) code C, for the MAC consists of 


e two message sets Mi = [1,2"" ] and M3 = [1,2"”]; 
e two encoding functions, fi : Mı > X? and fh : Mı > X3, which map a message 
mı or Mz to a codeword x} or x3; 
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Figure 2.7 Communication over a two-user multiple-access channel. 


e a decoding function g : Y” — Mı x M2 U {?}, which maps each channel observa- 
tion y” to a message pair (ħi, 2) E€ Mı x My) or an error message ?. 


The messages Mı and M, are assumed uniformly distributed in their respective sets, 
and the performance of a code C, is measured in terms of the average probability of 
error 


Pe(Cy) © P |u, Ma) # Mi, Ma)|c,]. 
Definition 2.24. A rate pair (R1, R2) is achievable for the MAC if there exists a sequence 
of (2"®, 2"®2 | n) codes {Cr}n>1 such that 
lim P.(C,,) = 0. 
noo 
The capacity region of a MAC is defined as 
cme £ cl({(R1, R2) : (Ri, R2) is achievable}). 


The characterization of the capacity region requires the notion of a convex hull, which 
we define below. 


Definition 2.25. The convex hull of a set S C R” is the set 


k k 
co(S) ê {Soa kD 1, Dale € [0,1 Soa =, ores). 
i=1 


i=l 

Theorem 2.11 (Ahlswede and Liao). Consider a MAC (æ, X2, PVX: X> y). For any 
independent distributions px, on ¥ı and px, on X2, define the set R(px, px,) as 

0 < Ri < 1X; YIX2) 

R(x, Px) Ê 4 (Ri, Ro): 0 < R < (X2; YIX1) , 

0 < Ri + Ro S I(X1X2; Y) 
where the joint distribution of X,, X2, and Y factorizes as px, Px, PY|X,X,. Then, the 
capacity region of a MAC is 


cw’ Scol LJ R(px.px:) 


PXPX 


Proof. We provide only the achievability part of the proof, which is similar to the 
proof of Shannon’s channel coding theorem and is based on joint typicality and random 
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coding. Fix two independent probability distributions, px, on V; and px, on æ. Let 
0 < € < UX,X,Y, Where 


A 


UxX:XY = Px (%1) PX. (%2) PY|X:X, (141, x2), 


min 
(x1,42, V)EX X XXY 


and let n € N*. Let R; > 0 and R, > 0 be rates to be specified later. We construct a 
(2%, 2”®2 | n) code C, as follows. 


e Codebook construction. Construct a codebook for user 1 with [2”*'] codewords, 
labeled x?(m ) with mı € |1, 2”*'], by generating the symbols x; ;(m) fori € [1, 7] 
and mı € |1, 2”*'] independently according to px,. Similarly, construct a codebook 
for user 2 with [2”*] codewords, labeled x3 (m2) with mz € [1, 2””]], by generating 
the symbols x2,;(mz) for i € [1, n] and m2 € [1,2””] independently according to 
px. The codebooks are revealed to all encoders and decoders. 

Encoder 1. Given mı, transmit x] (m1). 

Encoder 2. Given mn, transmit x3 (m2). 

e Decoder. Given y”, output (m1, m2) if it is the unique message pair such that 

(xi), x502), y”) € T? (X1X2Y); otherwise, output an error ?. 


The random variable that represents the randomly generated code C,, is denoted by C, 
and we proceed to bound E[P.(C,,)]. By virtue of the symmetry of the random code 
construction 


b[Pe(Cn)] = Ec, 


-— 


PLM, Mo) # (Mi, Ma)|Cx) 


= Cr 


-— 


P [Mu Mo) # (Mi, M2)|My = 1, Mp = 1, Ca] 


Therefore, the probability of error can be expressed in terms of the error events 


Ey & (X10), X30), Y") € (Ki X2Y)} fori e [1,2"®] and j € [1,2"®] 


as 
RCD = P eul eaul ezu U E 
ifl j#l 6DE, D 
By the union bound, 
[P.(Cr)) < P[ER] + XO PIEH AO PE] XO PLE). (2.12) 
ipl j#l DAU, D 
By the joint AEP, 
P [E8] < e(n). (2.13) 


For i 4 1, XẸ{(i) is conditionally independent of Y” given X3(1); therefore, by Corol- 
lary 2.3, 


P[E;1] < 27r U(X; Y|X2)—8(6)) for i # l. (2.14) 
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Similarly, we can show 


PIE] < 277X22; Y|X1)—8(6)) for j £1, (2.15) 
P[E;] < 2 UXX:Y)-) fori #landj #1. (2.16) 


On substituting (2.13)-(2.16) into (2.12), we obtain 


D[P.(C,,)] < 5.(n) + [2272r AXYIX2) ae) $ [2r Ra m Aes VPA) 


Eia (oy pane ese) 
Hence, if we choose R; and R; to satisfy 
Ry < I(X1; Y|X2) — de), 
Ry < I(X2; Y|X1) — (€), 
Ry + Ro < I(X1X2; Y) — 5), 


we obtain E[P.(C,,)] < 6.(”). By applying the selection lemma to the random variable 
C, and the function P,, we conclude that there exists a (2”*!, 2”*2, n) code C, such that 
P.(C,,) < 5.(n). Since € can be chosen arbitrarily small and since the distributions px, 
and px, are arbitrary, we conclude that 


0 < Ry < 1%; YIX2) 
U 4 (Ri, Ra) 0< R < IX; VIX) egw 
PX PX) O<Rk+tRh< 1(X1X2; Y) 


is achievable. Finally, it can be shown that time-sharing between different codes achieves 
the entire convex hull [3, Section 15.3]. We refer the reader to [3, 6] for the converse 
part of the proof. 


The typical shape of the region R(px, px,) is illustrated in Figure 2.8. The boundaries 
of the capacity region can be explained in a very intuitive way. When encoder 1 views 
the signals sent by encoder 2 as noise, its maximum achievable rate is on the order 
of I(X;; Y), which is a direct consequence of the channel coding theorem. Then, the 
decoder can estimate the codeword x and subtract it from the channel output sequence 
yÏ, thus allowing encoder 2 to achieve a maximum rate on the order of I(X2; Y|X;). This 
procedure is sometimes called successive cancellation and leads to the upper corner 
point of the region. The lower corner point corresponds to the symmetric case, in which 
encoder 2 views the signals sent by encoder 1 as noise. 


The broadcast channel 


While a multiple-access channel considers multiple sources and one destination, the 
broadcast channel (BC for short) considers a single information source that transmits 
to multiple users. Applications of the BC include the downlink channel of a satellite or 
of a base station in a mobile communication network, and the wiretap channel which 
is studied in detail in Chapter 3 and Chapter 5. As illustrated in Figure 2.9, a discrete 
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Figure 2.8 Typical shape of the rate region R(px, px,) of the multiple-access channel for fixed 
input distributions px, and px,- 
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Figure 2.9 Communication over a two-user broadcast channel. 


memoryless two-user broadcast channel (¥, pyzx, Y, Z) consists of a finite input 
alphabet X, two finite output alphabets Y and Z, and transition probabilities pyz;x such 
that 


Vn >1 W(x”, y", z”) eX" x Y” x Z” 
Yy 
n 
Pyr zex Q”, 2" |x") = J [ pvzx0;, Z;|x;). 
zi 


We assume that the transmitter wants to send a common message Mo to both receivers 
and a private message M; to the receiver observing Y”. The receiver observing Z” is 
called a “weak” user, while the receiver observing Y” is called the “strong” user. 


Definition 2.26. 4 (2"*, 2"*', n) code C, for the BC consists of 


e two message sets Mo = |1, 2"®] and Mi = [1,2""]; 
e an encoding function f : My x Mı > X”, which maps a message pair (mo, m1) to 
a codeword x"; 


42 


Fundamentals of information theory 


e a decoding function g : Y” — (Mo x Mj) U {?}, which maps each channel obser- 
vation y” to a message pair (ño, M1) E€ Mo x M, or an error message ?; 

e a decoding function h : Z” — Mo U {?}, which maps each channel observation z” 
to a message mo € Mo or an error message ?. 


Messages Mo and M; are assumed uniformly distributed in their respective sets and 
the performance of a code C,, is measured in terms of the average probability of error 


P(C) © P |My # Mo or (Mo, Mi) # (Mo, Mu) Cu]. 


Definition 2.27. A rate pair (Ro, Ri) is achievable for the BC if there exists a sequence 
of (2. DRI n) codes {Cn}n>1 such that 


lim P.(C,) = 0. 
n—> oo 
The capacity region of a BC is defined as 
C° £ cl({(Ro, R1) : (Ro, R1) is achievable}). 


As in many other fundamental problems of network information theory, determining 
the capacity region of the broadcast channel turns out to be a very difficult task. Therefore, 
we provide only an achievable rate region, which, in general, is strictly smaller than the 
capacity region. 


Theorem 2.12 (Bergsman and Gallager). Consider a BC (X, pyzx, Y, Z). For any 
joint distribution pux on U x X, define the set R(pux) as 


0 < Ro < min(I(U; Y), IU; Z)) 
£ 2 (Ro, Ri): 2 
R(pux) f o Ri) 0< Ri < IX: YIU) ; 


where the joint distribution of U, X, Y, and Z factorizes as pux pyz\x. Then, 


R! £ eo(U (nw) eo 
Pux 


In addition, the cardinality of the auxiliary random variable U can be limited to |U| < 
min(|¥], |X|, |Z). 


Proof. The proof that °° C C° is based on joint typicality, random coding, and a 
code construction called superposition coding. As illustrated in Figure 2.10, the idea 
of superposition coding is to create a codebook with [2”**] codewords for the weakest 
user and to superpose a codebook with [2”*'] codewords for the strongest user to every 
codeword. The codewords u” are often called “cloud centers,” while the codewords x” are 
called “satellite codewords.” Formally, fix a joint probability distribution pux onU x X. 
Let 0 < € < uxyu, where uxyu ê min pxu(x, u)pyixu(y|x, u) and let n € N*. Let 
Ro > Oand R; > 0 be rates to be specified later. We construct a oe, 2R, n) code C, 
as follows. 


e Codebook construction. Construct a codebook with [2”®°] codewords, labeled 
u” (mo) with mo € 1, aren by generating the symbols u;(mo) for i € [1, n] and 
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codebook for strong user 


superposed to codeword u”(1) 
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u"(1) 
z” (1, 2) 
u™(2) a 
N r” (1, Qe) 
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Figure 2.10 Superposition coding. A codebook for the strong user is superposed to every codeword 
of the codebook for the weak user. 


mo € [1. vig independently according to py. For each u” (mo) with mo € 1. zaži P 
generate another codebook with [2”*'] codewords, labeled x”"(mo, m1) with mı € 
[1,2"*'], by generating the symbols x;(mo, mı) for i € [1,n] and mı € [1,2”""] 
independently according to px|U=u,(m)). The codebooks are revealed to the encoder 
and both decoders. 

Encoder. Given (mo, mı), transmit x” (mo, mı). 

Decoder for weak user. Given z”, output mo if it is the unique message such that 
(u” (mo), z”) € T” (UZ); otherwise, output an error ?. 

Decoder for strong user. Given y”, output (Mo, 771) if it is the unique message pair 
such that (u” (ño), y”) € T? (UY) and (u” (ño), x” (mo, 71), y”) € TP (UXY); other- 
wise, output an error ?. 


The random variable that denotes the randomly generated codebook C,, is denoted by 
C,, and we proceed to bound E[P.(C,,)]. From the symmetry of the random code 
construction, notice that 


'[P(Cn)] = Ec, [P| Mo # Mo or (Mo, Mi) # (Mo, Mi)! Cr] | 


= Ec, [P [Mo Z Mo or (Mo, M1) # (Mo, Mi)| Mo = 1,Mi = 1, Call; 


Therefore, E[P.(C,,)] can be expressed in terms of the events 


Ei = {(U"(), Yj T."(UY)} fori € [1, 22o], 
F; = {(U" (i), Z") € T(UZ)} fori e [1,2®], 
Gy = {(U"G), XG), Y") € TI(UXY)} fori e [1,2"®] and j € [1,2"®] 


as 


RCA =P EU JEU FRULA ay 
ixl ixl j#l 
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and, by the union bound, 


[Pe(Ca)] < P[E{] + X PIEI +P[A] +S PIA+ 5 PG]. 17 
iAl iAl jl 


By the joint AEP, 
P[E{] < ên) and P[F}] < êen). (2.18) 
For i Æ 1, notice that U” (i) is independent of Y” and Z”; therefore, by Corollary 2.2, 
PE] < 2O and PIF] < 27A- fori £1. (2.19) 


For j Æ 1, X” (1, j) is conditionally independent of Z” given U” (1); therefore, by Corol- 
lary 2.3, 


P[Gij] < 2YDD- forj #1. (2.20) 
On substituting (2.18), (2.19), and (2.20) into (2.17), we obtain 


[R(C,)] < 5<(n) + pe le pp ea ee 
ah pom aa el 


Hence, if we choose the rates Rp and R; to satisfy 
Ro < min(I(U; Y), 1(U; Z)) — d(e), 
Rı < I(X; YIU) — d€), 


we obtain E[C,,] < ôe(n). By applying the selection lemma to the random variable C, 
and the function P,, we conclude that there exists a (2%, 2”*',n) code C, such that 
P.(C,) < 6-(n). Since € can be chosen arbitrarily small and since the distribution pux 
is arbitrary, we conclude that 


0<Ro< min(I(U; Y), IU; Z)) BC 
Ro, R1): €E 
í o Ri): 0 < Rı <S I(X; YIU) EC. 


Since pux is arbitrary and since it is possible to perform time-sharing, the theorem fol- 
lows. The bound for the cardinality of the random variable U follows from Caratheodory’s 
theorem, and we refer the reader to [3] for details. 


Bibliographical notes 


The definitions of typical sequences and their properties are those described in the 
textbooks [3, 4, 6]. The notion of d-separation is a known result in statistical inference, 
and we have used the definition provided by Kramer [8]. 

There are several ways of proving the channel coding theorem, see for instance [2, 
3, 4, 5], and our presentation is based on the approach in [3, 4]. The Slepian—Wolf 
theorem was established by Slepian and Wolf in [9]. Additional examples of results 
proved with random binning with side information can be found in [10] and [11]. 
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The capacity of the two-user multiple-access channel with independent messages was 
obtained independently by Ahlswede [12] and Liao [13]. The broadcast channel model 
was proposed by Cover [14]. Bergmans [15] and Gallager [16] proved that the achievable 
region obtained in this chapter is the capacity region of a subclass of broadcast chan- 
nels called physically degraded broadcast channels. Surveys of known results about 
network information theory can be found in Cover’s survey paper [17] or Kramer’s 
monograph [6]. 


Part Il 


Information-theoretic security 


3.1 


Secrecy capacity 


In this chapter, we develop the notion of secrecy capacity, which plays a central role 
in physical-layer security. The secrecy capacity characterizes the fundamental limit of 
secure communications over noisy channels, and it is essentially the counterpart to 
the usual point-to-point channel capacity when communications are subject not only 
to reliability constraints but also to an information-theoretic secrecy requirement. It 
is inherently associated with a channel model called the wiretap channel, which is a 
broadcast channel in which one of the receivers is treated as an adversary. This adversarial 
receiver, which we call the eavesdropper to emphasize its passiveness, should remain 
ignorant of the messages transmitted over the channel. The mathematical tools, and 
especially the random-coding argument, presented in this chapter are the basis for most 
of the theoretical research in physical-layer security, and we use them extensively in 
subsequent chapters. 

We start with a review of Shannon’s model of secure communications (Section 3.1), 
and then we informally discuss the problem of secure communications over noisy chan- 
nels (Section 3.2). The intuition we develop from loose arguments is useful to grasp 
the concepts underlying the proofs of the secrecy capacity and motivates a discussion 
of the choice of an information-theoretic secrecy metric (Section 3.3). We then study 
in detail the fundamental limits of secure communication over degraded wiretap chan- 
nels (Section 3.4) and broadcast channels with confidential messages (Section 3.5). We 
also discuss the multiplexing of secure and non-secure messages as well as the role of 
feedback for securing communications (Section 3.6). Finally, we conclude the chapter 
with a summary of the lessons learned from the analysis of fundamental limits and a 
review of the explicit and implicit assumptions used in the models (Section 3.7). The 
Gaussian wiretap channel and its extensions to multiple-input multiple-output channels 
and wireless channels are considered separately in Chapter 5. 


Shannon’s cipher system 


Shannon proposed the idea of measuring quantitatively the secrecy level of encryption 
systems on the basis of his mathematical theory of communication. Shannon’s model of 
secure communications, which is often called Shannon 5 cipher system, is illustrated in 
Figure 3.1; it considers a situation in which a transmitter communicates with a legitimate 
receiver over a noiseless channel, while an eavesdropper overhears all signals sent over 
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Figure 3.1 Shannon’s cipher system. 


the channel. To prevent the eavesdropper from retrieving information, the transmitter 
encodes his messages into codewords by means of a secret key, which is known to 
the legitimate receiver but unknown to the eavesdropper.! Messages, codewords, and 
keys are represented by the random variables M € M, X € X, and K € K, respectively, 
and we assume that K is independent of M. The encoding function is denoted by 
e: M x K — X, the decoding function is denoted by d : ¥ x K > M, and we refer 
to the pair (e, d) as a coding scheme. The legitimate receiver is assumed to retrieve 
messages without error, that is 


M=d(X,K) if X=e(M,K). 


Although the eavesdropper has no knowledge about the secret key, he is assumed to 
know the encoding function e and the decoding function d. 

To measure secrecy with respect to Eve in terms of an information-theoretic quantity, 
it is natural to consider the conditional entropy H(M|X), which we call the eavesdrop- 
per’s equivocation. Intuitively, the equivocation represents Eve’s uncertainty about the 
messages after intercepting the codewords. A coding scheme is said to achieve perfect 
secrecy if 


H(M|X) = H(M) or, equivalently, I(M; X) = 0. 


We call the quantity I(M; X) the leakage of information to the eavesdropper. In other 
words, perfect secrecy is achieved if codewords X are statistically independent of mes- 
sages M. This definition of security differs from the traditional assessment based on 
computational complexity not only because it provides a quantitative metric to measure 
secrecy but also because it disregards the computational power of the eavesdropper. 
Perfect secrecy guarantees that the eavesdropper’s optimal attack is to guess the message 
M at random and that there exists no algorithm that could extract any information about 
M from X. 


Proposition 3.1. Jf a coding scheme for Shannon’ cipher system achieves perfect 
secrecy, then 


H(K) > HI(M). 


' In cryptography, it is customary to call a message a plaintext, and a codeword a ciphertext or a cryptogram. 
We adopt instead the nomenclature prevalent in information theory and coding theory. 
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Proof. Consider a coding scheme that achieves perfect secrecy. By assumption, 
Hi(M|X) = H(M); in addition, since messages M are decoded without errors upon 
observing X and K, Fano’s inequality also ensures that H(M|XK) = 0. Consequently, 


HO) $ HK) — H(KIXM) 
S HKIX — H(KIXM) 
= IK; MIX) 
= H(M|X) — H(M|KX) 
= H(MIX) 
= H(M). 


Inequality (a) follows from H(K|XM) > 0 and inequality (b) follows from H(K) > 
H(K|X) because conditioning does not increase entropy. 


In other words, Proposition 3.1 states that it is necessary to use at least one secret-key 
bit for each message bit to achieve perfect secrecy. If the number of possible messages, 
keys, and codewords is the same, it is possible to obtain a more precise result and to 
establish necessary and sufficient conditions for communication in perfect secrecy. 


Theorem 3.1. Jf |M|= || = |C], a coding scheme for Shannons cipher system 
achieves perfect secrecy if and only if 


e foreach pair (m, x) € M x X, there exists a unique key k € K such thatx = e(m, k); 
e the key K is uniformly distributed in K. 


Proof. First, we establish that the conditions of Theorem 3.1 are necessary. Con- 
sider a coding scheme that achieves perfect secrecy with |M| = |4’| = |K]. Note that 
px(x) > 0 for all x € X, otherwise some codewords would never be used and could 
be removed from X, which would violate the assumption |M| = ||. Since M and X 
are independent, this implies pxjm(x|m) = px(x) > 0 for all pairs (m, x) € M x X. 
In other words, for all messages m € M, the encoder can output all possible codewords 
in 1; therefore, 


Ym eM X= {e(m,k):k € K}. 


Because we have assumed |¥| = |X|, for all pairs (m, x) € M x æ there must be 

a unique key k €e K such that x = e(m, k). Now, fix an arbitrary codeword x* € œ. 

For every message m € M, let km be the unique key such that x* = e(m, km). Then 

P«(kin) = Pxim(x*|m) and K = {km :m € M}. Using Bayes’ rule, we obtain 

PM x(m|x*) px(x*) 
PmM(n) = 

where the last equality follows from pmjx(m|x*) = pm(m) by virtue of the indepen- 


dence of M and X. Therefore, pp (km) takes on the same value for all m € M, which 
implies that K is uniformly distributed in K. 


px(x*), 


PK(kn) = Pxim(x*|m) = 
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Figure 3.2 Vernam’s cipher (one-time pad) illustrated for M = {0, 1}. 


We now show that the conditions of Theorem 3.1 are also sufficient. Since |M] = 
|X| = |K], we can assume without loss of generality that M = ¥ = K = [0, |M] — 1]. 
Consider now the coding scheme illustrated in Figure 3.2, called a Vernam cipher or 
one-time pad. To send a message m € M, Alice transmits x = m @ k, where k is the 
realization of a key K, which is independent of the message and with uniform distribution 
on M, and @ is the modulo-|M| addition. Since k is known to Bob, he can decode the 
message m from the codeword x without error by computing 


x@®k=m@kCk=n, 


where © is the modulo-|M| subtraction. In addition, this encoding procedure guarantees 
that, for all x € X, 


px&) = X pxix@ lO pK) = X pme ek) 


keM keM 


1o o 1 
IMI IMI? 


and, consequently, 
I(M; X) = H(X) — H(X|M) 
2 HX — H(K|M) 
H(X) — HK) 
= log|M| — log|M| 
= 0, 


where (a) follows from H(X|M) = H(K|M) because there is a one-to-one mapping 
between X and K given M and (b) follows from H(K|M) = H(K) because M and K are 
independent. Notice that this result holds for any probability distribution of the message 
pm for which Ym € M pm(m) > 0. 


The fact that a one-time pad guarantees perfect secrecy is a result usually referred 
to as the “crypto lemma,” which holds under very general conditions; in particular, the 
finite alphabet M can be replaced by a compact abelian group G.? 


? An abelian group G is a commutative group that need not be finite. The assumption that G is compact 
guarantees that its Haar measure is finite so that it is possible to define a uniform probability distribution 
over G. 


3.2 
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Figure 3.3 Communication over a binary erasure wiretap channel. 


Lemma 3.1 (Crypto lemma, Forney). Let (G, +) be a compact abelian group with 
binary operation + and let X = M + K, where M and K are random variables over 
G and K is independent of M and uniform over G. Then X is independent of M and 
uniform over G. 


Although Theorem 3.1 shows the existence of coding schemes that achieve perfect 
secrecy, it provides an unsatisfactory result. In fact, since a one-time pad requires a new 
key bit for each message bit, it essentially replaces the problem of secure communication 
by that of secret-key distribution. Nevertheless, we show in the next sections that this 
disappointing result stems from the absence of noise at the physical layer in the model; 
in particular, Shannon’s cipher system does not take into account the noise affecting the 
eavesdropper’s observation of the codewords. 


Remark 3.1. Requiring perfect secrecy is much more stringent than preventing the 
eavesdropper from decoding correctly. To see this, assume for simplicity that messages 
are taken from the set |1, M] and that each of them is equally likely, in which case the 
eavesdropper minimizes his probability of decoding error P, by performing maximum- 
likelihood decoding. Since the a-priori distribution of the message M is uniform over 
[1, M], the condition H(M|X) = H(M) ensures that pmıx(m|x) = 1/M for all mes- 
sagesm € M and codewords x € M or, equivalently, that the probability of error under 
maximum-likelihood decoding is P, = (M — 1)/M. In contrast, evaluating secrecy in 
terms of the non-decodability of the messages would merely guarantee that the proba- 
bility of error under maximum-likelihood decoding is bounded away from zero, that is 
P. > € for some fixed € > 0. 


Secure communication over a noisy channel 


Before we study secrecy capacity in detail, it is instructive to consider the effect of 
noise with the simple model illustrated in Figure 3.3, which is called a binary erasure 
wiretap channel. This channel is a special case of more general models that are studied in 
Section 3.4 and Section 3.5. Here, a transmitter communicates messages to a legitimate 
receiver by sending binary codewords of length n over a noiseless channel, while an 
eavesdropper observes a corrupted version of these codewords at the output of a binary 
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erasure channel with erasure probability € € (0, 1). Messages are taken from the set 
[1, M] uniformly at random, and are represented by the random variable M. Codewords 
are denoted by the random variable X” € {0, 1}” and the eavesdropper’s observation is 
denoted by Z” € {0, 1, ?}”. We assume that different messages are always encoded into 
different codewords, so that the reliable transmission rate is (1/n)H(M) = (1/n)log M. 

Rather than requiring perfect secrecy and exact statistical independence of M and X”, 
we consider a more tractable condition and we say that a coding scheme is secure if it 
guarantees lim„— oo I(M; Z”) = 0. The key difficulty is now that of how to determine the 
type of encoder that could enforce this condition. To obtain some insight, we consider a 
specific coding scheme for the model in Figure 3.3. 


Example 3.1. Assume that messages are taken uniformly at random from the set [1, 2] 
so that H(M) = 1, and let n be arbitrary. Let Cı be the set of binary sequences of 
length n with odd parity and let C, be the set of binary sequences of length n with 
even parity. To send a message m € {1, 2}, the emitter transmits a sequence x” chosen 
uniformly at random in Cm. The rate of the coding scheme is simply 1/n. Now, assume 
that the eavesdropper observes a sequence Z” with k erasures. If k > 0, the parity of 
the erased bits is just as likely to be even as it is to be odd. If k = 0, the eavesdropper 
knows perfectly which codeword was sent and thus knows its parity. To analyze the 
eavesdropper’s equivocation formally, we introduce the random variable E € {0, 1} such 
that 


_ fO if Z” contains no erasure; 
~ |1 otherwise. 


We can then lower bound the equivocation as 
H(M|2") > H(M|Z"E) 


= H(M|Z"E = 1). — (1 — «€)") 


= H(M)(1 - (1 —€)") 
= H(M) - (1 — e)". 


Equality (a) follows from the fact that H(M|Z”E = 0) = 0 and equality (b) follows from 
H(M) = 1. Hence, we obtain 


I(M; Z") = H(M) — H(M|Z") < (1 — €)", 


which vanishes exponentially fast with n; therefore, the coding scheme is secure. 


In practice, the coding scheme of Example 3.1 is not really useful because the code 
rate vanishes with n as well, albeit more slowly than does I(M; Z”); nevertheless, the 
example suggests that assigning multiple codewords to every message and selecting 
them randomly is useful to confuse the eavesdropper and to guarantee secrecy. 


3.3 
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Perfect, weak, and strong secrecy 


As mentioned in the previous section, the notion of perfect secrecy is too stringent and 
is not easily amenable to further analysis. It is convenient to replace the requirement of 
exact statistical independence between messages M and the eavesdropper’s observations 
Z” by asymptotic statistical independence as the codeword length n goes to infinity. In 
principle, this asymptotic independence can be measured in terms of any distance d 
defined on the set of joint probability distributions on M x Z” as 


lim d(pmz, PMPz-) = 0. 
noo 


For instance, in the previous section we implicitly used the Kullback—Leibler divergence? 
and we required 


lim D(pmz-|lpmpz") = lim I(M; 2") = 0. 


n> 


This condition, which we call the strong secrecy condition, requires the amount of 
information leaked to the eavesdropper to vanish. For technical purposes, it is also 
convenient to consider the condition 
> 1 

lim —I(M; Z”) = 0, 

n> n 
which requires only the rate of information leaked to the eavesdropper to vanish. This 
condition is weaker than strong secrecy since it is satisfied as long as I(M; Z”) grows at 
most sub-linearly with n. We call it the weak secrecy condition. 

From an information-theoretic perspective, the specific measure of asymptotic statisti- 
cal independence may seem irrelevant, and we may be tempted to choose a metric solely 
on the basis of its mathematical tractability; unfortunately, the weak secrecy condition 
and the strong secrecy condition are not equivalent and, more importantly, it is possible 
to construct examples of coding schemes with evident security flaws that satisfy the 
weak secrecy condition. 


Example 3.2. Letn > 1 andt ê |,/n]. Suppose that Alice encodes message bits M” € 
{0, 1}” into a codeword X” € {0, 1}” with n — t secret-key bits K”™ e {0, 1}"~‘ as 


x= M; @ K; fori e |l,” — t], 
‘| M;, fori € [n— t+ 1,7]. 


The key bits K; for i € [1, n — t] are assumed i.i.d. according to BG) and known to 
Bob. In other words, Alice performs a one-time pad of the first n — t bits of M with 
the n — t key bits and she appends the remaining ¢ bits unprotected. Eve is assumed to 
intercept the codeword X” directly. 


3 Strictly speaking, the Kullback—Leibler divergence is not a distance because it is not symmetric; nevertheless, 
D(pmz|lpm pz”) = 0 if and only if M is independent of Z”, and we ignore this subtlety. 
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Using the crypto lemma, we obtain 
Yn >1 H(M|X") =n — t = H(M) — t. 


Therefore, I(M; X”) = t = |./n]| and this scheme does not satisfy the strong secrecy 

criterion; even worse, the information leaked to the eavesdropper grows unbounded with 

n. However, notice that 

LVn] 
n 


1 
lim —I(M;X") = lim =0. 
n>œ Nn n> 


Hence, this scheme satisfies the weak secrecy criterion. 


Example 3.3. Suppose that Alice encodes messages M = (M,...M,,) uniformly dis- 
tributed on {0, 1}” into codewords X” € {0, 1}” with secret keys K” € {0, 1}” as 


X; =M; @K; fori e [1,7]. 


The secret key K”, which we assume is known to Bob, is such that the all-zero n-bit 
sequence 0” has probability 1/n and all non-zero sequences are equally likely. Formally, 
the probability distribution of the secret key is 

if k” = 0", 

n 
1—1/n 
27 — 


px (k") = i 
fk” £0". 


Since K” is not uniformly distributed, this encryption scheme no longer guarantees 
perfect secrecy. As in the previous example, we assume that Eve directly intercepts X”. 

We first prove that this scheme satisfies the weak secrecy criterion. We introduce a 
random variable J such that 


jä 0 ifK” =0", 
1 otherwise. 


Since conditioning does not increase entropy, we can write 
H(M|X") > HI(M|X"J) 
= H(M|X", J = 0)p;(0) + H(M|X", J = 1)py(1). (3.1) 


By definition, K” = 0” if J = 0; hence, H(M|X”, J = 0) = 0 and we can restrict our 
attention to the term 

H(M|X", J = 1)py(1) = — © pm, x", j = log p(m|x", j = 1). 
For any m € {0, 1}” andx” € {0, 1}”, the joint probability p(m, x”, j = 1) canbe written 


as 


p(x", m, j = 1) = p(x", j = De" = DeG = 1) 


3.3 Perfect, weak, and strong secrecy 


with 


0 ifm =x", 


n il) 
P(m|x", j = 1) Ge — 1) otherwise, 


1 
mia = 1 =r, 
Pali = 1) 5 


1 
pU =1)=1--. 
n 


On substituting these values into (3.1), we obtain 


; i o1 1 1 
HON) =), aan (1 ~) toe (57) 


x” mAx" 
1 1 
oil te og = — 
log(2” — 1 
= log(2” — 1) — 82" =D 
n 


> log(2” — 1)— 1. 
Since H(M) = n, we obtain 
1 1 
lim —I(M; X”) = 1 — lim —H(M|X”") 
n>œ n n>œ n 


log(2" —1)—1 
Six im oe ) 


noo n 


= 0. 
Hence, this scheme satisfies the weak secrecy criterion. However, 
H(M| |X") = H(X” KIX”) 
= H(K|X”) 
< HK) 


td (=i. fii 
ze 2" — 1). 1 
Z log (=) Ora og (52) 


= Hy(1/n) + (1 — 1/n)log(2” — 1) 
< Hy(/n)+n-1 


<n-—0.5 forn large enough. 
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Therefore, limy—.o. I(M; X”) > 0.5, and this scheme does not satisfy the strong secrecy 


criterion. 


One could argue that Example 3.2 and Example 3.3 have been constructed ad hoc 
to exhibit flaws. In Example 3.2, the eavesdropper always obtains a fraction of the 
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Figure 3.4 Communication over a DWTC. M represents the message to be transmitted securely, 
whereas R represents randomness used to randomize the encoder. 


message bits without errors; in Example 3.3, the distribution of the key is skewed in 
such a way that the all-zero key is with overwhelming probability more likely than 
any other. Therefore, these examples do not imply that all weakly secure schemes are 
useless, but merely suggest that not all measures of asymptotic statistical independence 
are meaningful from a cryptographic perspective. In particular, this is a good indication 
that the weak secrecy criterion is likely not appropriate and, consequently, we should 
strive to prove all results with a strong secrecy criterion. 


Wyner’s wiretap channel 


Secrecy capacity was originally introduced by Wyner for a channel model called a 
degraded wiretap channel (DWTC for short). Although this model is a special case of 
the broadcast channel with confidential messages studied in Section 3.5, it allows us to 
introduce many of the mathematical tools of information-theoretic security without the 
additional complexity of fully general models. As illustrated in Figure 3.4, a DWTC 
models a situation in which a sender (Alice) tries to communicate with a legitimate 
receiver (Bob) over a noisy channel, while an eavesdropper (Eve) observes a degraded 
version of the signal obtained by the legitimate receiver. 

Formally, a discrete memoryless DWTC (2%, pzjy pyx, Y, Z) consists of a finite 
input alphabet X, two finite output alphabets Y and Z, and transition probabilities pyx 
and pzy such that 


Vn >1 W(x”, y”, z") E€ X” x Y" x Z” 


priz O”, "x = || pvixGile)pzivGily). 6-2 


i=1 


The DMC (æ , PYIX: y) characterized by the marginal transition probabilities pyx 
is referred to as the main channel, while the DMC (æ , PZ|X; Z) characterized by 
the marginal transition probabilities pzjx is referred to as the eavesdropper s channel. 
The eavesdropper is sometimes called the wiretapper, and, accordingly, its channel is 
called the wiretapper’s channel, but we avoid this terminology because it makes limited 
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sense for the wireless channels discussed in Chapter 5. Throughout this book, we always 
assume that the transmitter, the receiver, and the eavesdropper know the channel statistics 
ahead of time. 

As hinted in Section 3.2, randomness in the encoding process is what enables secure 
communications. It is convenient to represent this randomness by the realization of a 
DMS (R, pr), which is independent of the channel and of the messages to be transmitted. 
Because the source is available to Alice but not to Bob and Eve, we call it a source of 
local randomness. 


Definition 3.1. 4 Pas n) code Cn, for a DWTC consists of 


e a message set M = |1, 2°]; 

e a source of local randomness at the encoder (R, pr); 

è an encoding function f : M x R —> X”, which maps a message m and a realization 
of the local randomness r to a codeword x" ; 

e a decoding function g : Y” — M U {?}, which maps each channel observation y” to 
a message m € M or an error message ?. 


Note that the DMS (R, pr) is included in the definition because we implicitly assume 
that it can be optimized as part of the code design. The (2”*, n) code C, is assumed 
known by Alice, Bob, and Eve, and this knowledge includes the statistics of the DMS 
(R, pr); however, the realizations of the DMS used for encoding are accessible only to 
Alice. We also assume that the message M is uniformly distributed in M, so that the 
code rate is (1/n)H(M) = R + 6(n). The reliability performance of C, is measured in 
terms of its average probability of error 


Pe(Cn) © P|M ¢ MI Cy], 
while its secrecy performance is measured in terms of the equivocation 
E(C,,) = H(M|Z"C,). 


We emphasize that equivocation is conditioned on the code C, because the eavesdropper 
knows the code ahead of time. Equivalently, the secrecy performance of the code C,, can 
be measured in terms of the leakage 


LC,,) = 1(M; Z"|C,), 


which measures the information leaked to the eavesdropper instead of the uncertainty 
of the eavesdropper. 


Remark 3.2. Jn the literature, it is common to introduce the local randomness implicitly 
by considering a stochastic encoder f : M — X”, which maps a message m € M toa 
codeword x” € X” according to transition probabilities px»\m. 


Remark 3.3. Stochastic encoding is crucial to enable secure communications but 
there is no point in considering a stochastic decoder. To see this, consider a stochastic 
decoder that maps each channel observation y” € Y” to a symbol v € V according 
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to transition probabilities py\yn, where V is an arbitrary alphabet. From the data- 
processing inequality, we have I(M; V) < I(M; Y”); therefore, according to the channel 
coding theorem, stochastic decoding can only reduce the rate of reliable transmission 
over the main channel, while having no effect on the eavesdropper 5 equivocation. 


Definition 3.2. A weak rate—equivocation pair (R, Re) is achievable for the DWTC if 
there exists a sequence of (2"*, n) codes {Cy}n>1 such that 


lim P.(C,,) = 0 (reliability condition), (3.3) 
noo 

1 
lim —E(C,) > Re (weak secrecy condition). (3.4) 
n=>œ M 


The weak rate—equivocation region of a DWTC is 
RT £ cl({(R, Re) : (R, Re) is achievable}), 
and the weak secrecy capacity of a DWTC is 
oS sup{ R : (R, R) e R™®}, 


Remark 3.4. According to our definition, if a rate—equivocation pair (R, Re) is achiev- 
able, then any pair (R, R!) with R, < Re is achievable as well. In particular, note that 
(R, 0) is always achievable. 


The rate—equivocation region R°“’’ encompasses rate—equivocation pairs for which 
Re is not equal to R; it characterizes the equivocation rate that can be guaranteed for 
an arbitrary rate R. If a pair (R, Re) with Rẹ = R is achievable, we say that R is a full 
secrecy rate. In this case, notice that a sequence of (2”?, n) codes {Cn}n>1 that achieves 
a full secrecy rate satisfies 

. 1 
lim en) = 0. 


n—=> o 


Full secrecy is of practical importance because the messages transmitted are then entirely 
hidden from the eavesdropper. In the literature, full secrecy is sometimes called “perfect 
secrecy.” In this book, the term “perfect secrecy” is restricted to Shannon’s definition of 
information-theoretic security, which requires exact statistical independence. 

The secrecy condition (3.4) is weak because it is based on the equivocation rate 
(1/n)E(C,,). As discussed in Section 3.3, it would be preferable to use a stronger condition 
and to rely on the following definition. 


Definition 3.3. A strong rate-equivocation pair (R, R.) is achievable for the DWTC if 
there exists a sequence of (2"* , n) codes {Cy }n>1 such that 


lim P.(C,,) = 0 (reliability condition), (3.5) 
n> Co 
lim (E(C,,) —nR.) > 0 (strong secrecy condition). (3.6) 
noo 


The strong rate-equivocation region of a DWTC is 


Rowe £ cl({(R, Re) : (R, Re) is achievable}), 
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Figure 3.5 Typical shape of the rate-equivocation region R™T° (px). 


and the strong secrecy capacity of a DWTC is 
CT £ sup{R : (R, kek’ }. 
R 


Unfortunately, directly dealing with the stronger condition (3.6) is more arduous than 
dealing with the weak secrecy condition (3.4). Moreover, we will show in Section 4.5 that 
RINTE — RMT and CoC = C&T: therefore, we will content ourselves with (3.4) for 
now, but the reader should keep in mind that this is mainly for mathematical tractability. 

It is not a priori obvious whether the reliability condition (3.3) and the secrecy 
condition (3.4) can be satisfied simultaneously. On the one hand, reliability calls for the 
introduction of redundancy to mitigate the effect of channel noise; on the other hand, 
creating too much redundancy is likely to jeopardize secrecy. Perhaps surprisingly, the 
balance between reliability and secrecy can be precisely controlled with appropriate 
coding schemes and the rate—equivocation region can be characterized exactly. 


Theorem 3.2 (Wyner). Consider a DWTC (x, PZ Pyix, V, Z). For any distribution 
px on X, define the set R™? (px) as 


< <RE< K 
Reep) & f (R, Ro) e a 


0 < Re < I(X; YIZ) 


where the joint distribution of X, Y, and Z factorizes as px py\x pz\y. Then, the rate— 
equivocation region of the DWTC is the convex region 


RT = [Re Gee), (3.7) 

PX 
The typical shape of R°’™(px) is illustrated in Figure 3.5. At transmission rates 
below I(X; Y|Z), it is always possible to find codes achieving full secrecy rates. It is 
also possible to transmit at rates above I(X; Y|Z), but the equivocation rate saturates at 
Re = I(X; Y|Z), and there is no secrecy guaranteed for the remaining fraction of the rate. 


Remark 3.5. In Wyner’s original work, the equivocation rate is defined per source 
symbol as A = (1/k)H(M|Z"C,,) with k = log[2"*]. Since A= R./R, the rate- 
equivocation region (R, A) can be obtained from the rate-equivocation region (R, Re), 
but, in general, the region (R, A) is not convex. 
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Theorem 3.2 is proved in Section 3.4.1 and Section 3.4.2. Before getting into the 
details of the proof, it is instructive to consider some of its implications. First, by 
specializing Theorem 3.2 to full secrecy rates for which Re = R, we obtain the secrecy 
capacity of the degraded wiretap channel. 


Corollary 3.1. The secrecy capacity of a DWTC (X, Pzy Py\x. Y, Z) is 
cow’ = max I(X; Y|Z) = max(I(X; Y) — I(X; Z)). (3.8) 
Px Px 


If Y = Z, that is the eavesdropper obtains the same observation as the legitimate 
receiver, then I(X; Y|Z) = 0 and thus C?™"° = 0. This result is consistent with the anal- 
ysis of Shannon’s cipher system in Section 3.1 and the idea that information-theoretic 
security cannot be achieved over noiseless channels without secret keys. 


Remark 3.6. Theorem 3.2 and Corollary 3.1 also hold for a vector channel on replacing 
random variables by random vectors where appropriate. 


Corollary 3.1 is quite appealing because the secrecy capacity is expressed as the differ- 
ence between an information rate conveyed to the legitimate receiver and an information 
rate leaked to the eavesdropper. To obtain an even simpler and more intuitive charac- 
terization, it is also useful to relate the secrecy capacity to the main channel capacity 
Ca max,, I(X; Y) and to the eavesdropper’s channel capacity Ce 4 max,, I(X; Z). 
For a generic DWTC (X, pzy py\x. Y, Z), we have 


cow’ = max(I(X; Y) — I(X; Z)) 
PX 
> max I(X; Y) — max I(X; Z) 
PX PX 


= Cm — Ce; 


that is, the secrecy capacity is at least as large as the difference between the main 
channel capacity and the eavesdropper’s channel capacity. The inequality can be strict, 
as illustrated by the following example. 


Example 3.4. Consider the DWTC illustrated in Figure 3.6, in which the main channel is 
a “Z-channel” with parameter p, while the eavesdropper’s channel is a binary symmetric 
channel with cross-over probability p. One can check that 


Cm = max (Hil (1 — p)) — qb (p)), 
qé[0, 1] 
Ce = 1 — H(p), 
C= max, (Hi4 0 — p)) + (1 — q)Hb(p) — Hb(p + q — 2pq)). 


For p = 0.1, we obtain numerically Cm — Ce ~ 0.232 bits, whereas COX ~ 0.246 bits. 


Nevertheless, there are DWTCs for which the lower bound Cm — Ce turns out to be 
exactly the secrecy capacity. 
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Figure 3.6 Example of a DWTC with non-symmetric channels. 


Definition 3.4 (Weakly symmetric channels). A DMC (x , PYIX; y) is weakly sym- 
metric if the rows of the channel transition-probability matrix are permutations of each 
other and the column sums >> -y pyx |x) are independent of y. 


An important characteristic of weakly symmetric channels is captured by the following 
lemma. 


Lemma 3.2. The capacity-achieving input distribution of a weakly symmetric channel 
(x, PY\x; yX) is the uniform distribution over X. 


Proof. For an input distribution px, the mutual information I(X; Y) is 
IX; Y) = HY) — H(YX) = HY) — SVX = )px(@). 
xE 
Notice that H(Y|X = x) is a constant, say H, that is independent of x because the rows 
of the channel transition-probability matrix are permutations of each other. Thus, 
I(X; Y) = H(Y) — H < logly| — H, 
with equality if Y is uniform. We show that choosing px(x) = 1/|æ| for all x € ¥ 
induces a uniform distribution for Y. In fact, 
1 


X] X pxo). 


xE 


PY”) = > pyixWlx) px) = 


xEX 


Since `, py|x(|x) is independent of y by assumption, py(y) is a constant. By the law 
of total probability, it must hold that py(v) = 1/|Y| for all y € V. 


Proposition 3.2 (Leung-Yan-Cheong). Jf the main channel and the eavesdropper s 
channel of a DWTC i PZIY Py\x, y, Z) are both weakly symmetric, then 


CES Caw Ca (3.9) 


where Cm is the capacity of the main channel and C, is that of the eavesdropper 5 
channel. 


The proof of Proposition 3.2 hinges on a general concavity property of the conditional 
mutual information I(X; Y|Z), which we establish in the following lemma. 


Lemma 3.3. Let Xe X, Yey, and ZEZ be three random variables with joint 
probability distribution pxyz. Then, \(X;Y|Z) is a concave function of px for fixed 
PYZIX- 
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Proof. For fixed transition probabilities pyz;x, we interpret I(X; Y|Z) as a function 
of px and we write I(X; Y|Z) 4 Ft (px). Let X1, Yı, and Z; be random variables such 
that 


Vix, Y, z) EAX y xZ PX,Y,Z,(, Y, Z) = Pyzx., z\x)px (x). 
Similarly, let X2, Y2, and Z2 be random variables such that 
Yx, y, Z) EX XY XZ px,yz,(X, yY, Z) = pyzıx, z\x)px (x). 


We introduce the random variable Q € {1,2} which is independent of all others such 
that 


Qê 1 with probability 2, 
-~ |2 with probability 1 — 2, 


and we define the random variables 
X= Xo; ¥= Yo; and Lim LG; 


Note that Q —> X > YZ forms a Markov chain and that, for all x € ¥, px(x) = 
Apx, (x) + (1 — A)px, (x). Then, 


IX; ¥|Z) = H(Y|Z) — H(YIXZ) 
> H(YIZQ) — H(Y|XZQ), 


where the inequality follows from H(Y|Z) > H(Y|ZQ), since conditioning does not 
increase entropy, and H(Y|ZX) = H(Y|ZXQ), since Q is independent of Y given X. 
Therefore, 


W(X; Y|Z) > UX; YIZQ) = A(X; Vi[Z1) + A — AMOG; Y2|Z2), 
or, equivalently, 


f (Apx, T ( = d)Px,) > AF (px) F ad = 2) f (px), 


which is the desired result. 


Note that Lemma 3.3 holds for any transition probabilities pyz;x, not just those 
corresponding to a degraded channel. 


Proof of Proposition 3.2. The DMCs (x, PY\x; y) and (x, PzIx, Z) are weakly sym- 
metric; therefore, by Lemma 3.2, I(X; Y) and I(X; Z) are both maximized if X is uniformly 
distributed over V. For a degraded channel, I(X; Y) — I(X; Z) = I(X; Y|Z), which is a 
concave function of px by Lemma 3.3. Therefore, I(X; Y|Z) is also maximized if X is 
uniformly distributed and 


CoN’ = max I(X; Y|Z) = max I(X; Y) — max I(X; Z) = Cm — Ce. 
Px Px Px 


3.4.1 
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Figure 3.7 Example of a DWTC with weakly symmetric channels. 


Remark 3.7. From the proof of Proposition 3.2, we see that a sufficient condition to 
obtain COX’ = Cm — Ce is that I(X; Y) and 1(X; Z) are maximized for the same input dis- 
tribution px. Nevertheless, checking that the channels (x, PY|x; y) and (x, Pz\x; Z) 
are weakly symmetric is, in general, a much simpler task. 


Proposition 3.2 is useful because many channels of interest (binary symmet- 
ric channels) are indeed weakly symmetric and their secrecy capacity then follows 
easily. 


Example 3.5. Consider the DWTC illustrated in Figure 3.7, which is obtained by cascad- 
ing two binary symmetric channels BSC(p) and BSC(q). The main channel is symmetric 
by construction, and the eavesdropper’s channel is a BSC(p + q — 2pq), which is also 
symmetric. Therefore, by Proposition 3.2, 


core = Cm — Ce 
= 1 — H(p) — (1 — H(p +4 — 2p4)) 
= H(p + q — 2pq) — H(p). 


Achievability proof for the degraded wiretap channel 


In this section, we prove that the rate pairs in °° given by Theorem 3.2 are achievable. 
As is usual in information theory, we use a random-coding argument, and we show the 
existence of codes for the DWTC without constructing them explicitly; nevertheless, 
before we can start the proof, it is still necessary to identify a generic code structure that 
can guarantee secrecy and reliability simultaneously. In the next paragraphs, we do so 
by developing two desirable properties that wiretap codes should satisfy to guarantee 
full secrecy. 

The discussion and example in Section 3.2 suggest that several codewords should 
represent the same message and that the choice of which codeword to transmit should 
be random, to “confuse” the eavesdropper. This statement can be made somewhat more 
precise; we can argue that, in general, a wiretap code must possess this property. To 
see this, assume that we use a wiretap code C,, that guarantees communication with full 
secrecy. It is reasonable to assume that messages are determined uniquely by codewords, 
that is H(M|X"C,,) = 0. In addition, assume that the encoding function is a one-to-one 
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Figure 3.8 Binning structure and encoding process for a wiretap code. 


mapping, that is H(X”|MC,) = 0. We can write the leakage (1/n)L(C,,) as 


1 1 
—L(C,) = —I(M; Z" Cn) 
n n 


1 1 
~I(MX"; Z"|C,) — —I(X"; Z"|MCy) 
n n 


1 1 1 
LIX"; Z” |C) + —I(M; Z"|X"C,) — -I(X"; Z"|MC,). 
n n n 


Since we have assumed that H(M|X"C,) = 0 and H(X"|MC,,) = 0, we have also 
I(M; Z”|X"C,,) = 0 and I(X"; Z”|MC,,) = 0; therefore, 


1 1 
—I(X”; Z” |Ca) = -L(C,), 
n n 


which means that the information leaked to the eavesdropper about codewords is equal to 
the information leaked to the eavesdropper about messages. If C, allows communication 
in full secrecy, then for some small € > 0 we have (1/n)L(C,,) < € and, consequently, 
(1/n)I(X"; Z”|C,,) < € as well. Notice that the relation between X” and Z” is determined 
in part by the channel, over which the transmitter does not have full control. For a DMC, 
guaranteeing that (1/n)I(X"; Z”|C,,) < € is in generalt possible only if (1/n)H(X"|C,,) < 
ô(€); that is, the transmission rate must be negligible. Therefore, to transmit at a non- 
negligible rate, we need (1/n)H(X"|MC,,) to be non-zero. In other words, the encoder 
should select a codeword at random among a set of codewords representing the same 
message. As illustrated in Figure 3.8, we can think of such a set as a sub-codebook or 
as a “bin” of codewords within the codebook; hence, we will say that a wiretap code 
should possess a binning structure. 


Remark 3.8. Despite the similarity between Figure 3.8 and Figure 2.10, the binning 
structure of a wiretap code is different from the superposition coding structure introduced 
for the broadcast channel in Section 2.3.3. A wiretap code consists of a single codebook 
partitioned into bins, whereas a superposition codebook for the broadcast channel 
consists of several codebooks that are superposed. 


4 It is possible to prove that, with high probability, the good random codes identified by random-coding 
arguments leak an information rate that grows linearly with n over a DMC. 
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The second desirable property of wiretap codes concerns the local randomness R used 
in the encoding process. Since codewords are a function of messages and local random- 
ness, note that I(X”; Z”|C,,) = ICMR; Z”|C,,). In addition, since the local randomness R 
is independent of M, we have H(R|MC,,) = H(RIC,,). Therefore, 


1 1 
—L(Cn) = (M; Z" |C) 
n n 
1 1 
= —I(MR; Z” |C) — —1(R; Z"|MC,) 
n n 
1 1 1 
= IX”; Z"\C,) — -H(RIMC,) + —H(R|Z"MC,) 
n n n 
1 1 1 
= —](X"; Z” |C) — —H(RIC,) + —H(R|Z” MC,). (3.10) 
n n n 


If the code C, allows communication in full secrecy, then it must be that (1/n)L(C,) < € 
for some e€ > 0. Note that (3.10) suggests that this is indeed possible, because the 
confusion introduced by the source of local randomness is represented by the term 
(1/n)H(RIC,,), which compensates in part for the information rate leaked to the eaves- 
dropper (1/n)I(X”; Z” |C). However, to ensure that the confusion cancels out the infor- 
mation rate leaked, it seems desirable to design a code such that (1/n)H(R|Z” MC,,) is 
small. 


Remark 3.9. Now that generic properties of wiretap codes have been identified, it 
would be tempting to start a random-coding argument and to analyze both the error 
probability and the equivocation rate for a random-code ensemble. However, there is a 
subtle but critical detail to which we must pay attention. Let C, be the random variable 
that denotes the choice of a code C, in the code ensemble. The probability of error 
averaged over the ensemble is then 


P[M 4M] = Ec, [P/M 4 MIC] = 5 pc, CRC). 
Cn 


In other words, the probability of error averaged over the ensemble is equal to the 
average of the probability of error of individual codes. Consequently, if the probability 
of error averaged over the ensemble is smaller than some € > 0, there must exist at 
least one specific code C, with P.(C,) < €. In contrast, for the equivocation of the code 
ensemble H(M|Z"), we have 


1 1 1 
—H(M|Z") > —H(M|Z"C,) = > Pc, (Cn) —E(C,). 
n n = n 


The equivocation of the code ensemble is greater than the average of equivocations of 
individual codes. Therefore, even if (1/n)H(M|Z") is greater than some value Re, this 
does not ensure the existence of a specific code C, such that (1/n)E(C,) > Re. 
Consequently, our proof must somehow analyze H(M|Z"C,,) or H(M|Z"C,,) 
directly. Wyner’s original approach was to study the equivocation H(M|Z"C,,) of a 
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well-constructed code Cp; in this book we choose to study H(M|Z"C,,) directly with a 
random-coding argument. 


For ease of reading, we carry out the proof in three distinct steps. 


1. For a fixed distribution px on ¥ and R < I(X; Y|Z), we use a random-coding argu- 
ment and show the existence of a sequence of (2"*, n) codes {C,},>1 that possess 
the binning structure illustrated in Figure 3.8 and are such that 


1 1 
lim P.(C,)=0, lim —H(R|Z"MC,)=0, and lim -L(C,) < € 
noo n>œ n n>œ Nn 


for some arbitrary € > 0. This shows the existence of wiretap codes with “close” to 
full secrecy and 


0 < R < K(X; YIZ) 


1 A s 
Ri(px) Ê { (R, Ro) oes 


\ E REMC 
The region R'(px) contains the full secrecy rate R < I(X; Y|Z), but is, in general, 
strictly smaller than R'°(px) defined in Theorem 3.2. 

2. We show that R°™(px) CR“ with a minor modification of the codes {C,}n>1 
analyzed in Step 1. 

3. We show that R” is convex. 


Step 1. Random-coding argument 
We prove the existence of a sequence of oe n) codes {Cy}n>1 for the DWTC with a 
binning structure as in Figure 3.8 such that 


1 
lim P(C,)=0, lim —H(R|Z"MC,,) = 0, (3.11) 
n> n>œ n 
1 
lim —L(C,) < 8(€). (3.12) 
n> n 


The existence of these codes is established by choosing a specific source of local 
randomness and by combining the two constraints in (3.11) into a single reliability 
constraint for the enhanced DWTC illustrated in Figure 3.9. This channel enhances the 
original DWTC by 


e introducing a virtual receiver, hereafter named Charlie, who observes the same chan- 
nel output Z” as Eve in the original DWTC, but who also has access to M through an 
error-free side channel; 

e using a message Ma with uniform distribution over |1, 2”*«] in place of the source of 
local randomness (R, pr), and by requiring Ma to be reliably decoded by both Bob 
and Charlie. 


Formally, a code for the enhanced channel is defined as follows. 
Definition 3.5. 4 eas QrRa | n) code C,, for the enhanced DWTC consists of 


e two message sets, M = [1,2"*]] and Mg = [1, 2"**]; 
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Figure 3.9 Enhanced DWTC. 


an encoding function f : M x Ma —> X”, which maps each message pair (m, mq) 
to a codeword x"; 

a decoding function g : Y” —> (M x Ma) U {?}, which maps each channel observa- 
tion y” to a message pair (m,ma) € M x Ma or an error message ?; 

a decoding function h : Z" x M — MaU {2}, which maps each channel obser- 
vation z” and its corresponding message m to a message Ma € Ma or an error 


message ?. 


We assume that M and Mg are uniformly distributed in their respective sets. The 
reliability performance of a (2”*, 2”**, n) code C,, is measured in terms of its average 
probability of error 


Pe(Cn) © P [(MA, Ma) # (M, Ma) or Ma # Ma|C,]. 


Because the message Ma is a dummy message that corresponds to a specific choice for 
the source of local randomness (R, pr) in the original DWTC, a (2”*, 2”*«, n) code 
Cn for the enhanced channel is also a (2”*,n) code C, for the original DWTC. By 
construction, the probability of error for the DWTC does not exceed the probability of 
error for the enhanced DWTC, since 


p[M 4 M|Cr| < p|, Ma) £ (M, Ma) or Ma # Ma|Cn] = P.(C,). 

In addition, from Fano’s inequality, we have 
1 
—H(Mg|Z"MC,,) < 6(P.(C,)). 
n 

Therefore, if lim, P(Cn) = 0, the constraints of (3.11) are automatically satisfied 

with Mg in place of R. 

The leakage guaranteed by the codes C, is formally calculated in the next paragraphs; 


nevertheless, it is useful to understand intuitively why the structure of C, makes this 
calculation possible. By using the condition (1/n)H(Mg|Z" MC,,) © 0 in (3.10), we 
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write the leakage (1/n)L(C,,) as 


1 1 
—L(C,,) = —1(M; Z"C,,) 
n n 


1 1 1 
IX"; Z"C,) — —H(MalC,,) + —H(Mg|Z"MC,,) 
n n n 


Q 


1 
—1(X"; Z2"|Cn) — Ra- 
n 


Notice that the dummy-message rate Rg counterbalances the information rate 
(1/n)I(X"; Z” |C) about codewords leaked to the eavesdropper. In what follows, we 
design C, carefully so that the dummy-message rate almost cancels out the information 
leaked to the eavesdropper. 


Remark 3.10. Consider a DWTC such that for any distribution px on X 
WX; Y)-I(X;Z)=0 or W(xX;Z)=0. 


If I(X;Y) — I(X; Z) = 0, the equivocation bound in (3.7) reduces to Re = 0, which is 
always achieved, as has already been discussed in Remark 3.4. If I(X; Z) = 0, the 
eavesdropper 5 observation is independent of the channel input, which automatically 
ensures full secrecy Re = R. In both cases, the achievability proof reduces to that of the 
channel coding theorem for the DMC (xX, PY|x; y). 


We now go back to the construction of codes for the enhanced DWTC. We begin 
by choosing a distribution px on ¥ and, following the discussion in Remark 3.10, we 
assume without losing generality that px is such that 


I(X;Y) —1(X;Z) > 0 and I(X;Z)>0. 


Let 0 < € < xyz, where 


A š 
= mi pxyz(x, y, Z), 


HxXYZ 
(x,y, Z)JE¥ XV XZ 


and let n € N*. Let R > 0 and Ra > 0 be rates to be specified later. We construct a 
(2”R, 2"%a, n) code for the enhanced DWTC as follows. 


Codebook construction. Construct a codebook C, with [2”*][2”*] codewords 
labeled x"(m, mg) with m € [1, 2”*] and ma € [1, 2”*'], by generating the symbols 
x;(m, mg) fori € [1,n], m € [1, 2"?], and ma € [1, 2”**] independently according 
to px. In terms of the binning structure of Figure 3.8, mq indexes the codewords 
within the bin corresponding to message m. The codebook is revealed to Alice, Bob, 
and Charlie. 

Alice's encoder f. Given (m, ma), transmit x"(m, ma). 

Bob’s decoder g. Given y”, output (m, ma) if it is the unique message pair such that 
(x"(m, ma), y”) € T? (XY); otherwise, output an error ?. 

Charlie's decoder h. Given z” and m, output mg if it is the unique message such that 
(x"(m, ma), Zz") € T” (XZ); otherwise, output an error ?. 
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The random variable that represents the randomly generated codebook C, is denoted by 
C,,. First, we develop an upper bound for E[P.(C,,)] as in the proof of the channel coding 
theorem. Note that, from the symmetry of the random-coding construction, we have 


z[P.(C,)] = Ec, [P [Mm #MIC,,M = 1|]. 


Thus, without loss of generality, we can assume that M = 1 and Ma = 1, and we can 
express E[P.(C,)] in terms of the events 


Eij = {X G, j), Y”) € TE(XY)} for (i, j) € 1. 27] x [1.2], 
f= ford, i), Z”) € T? (XZ)} fori € [1, 2°] 


as 
PCA = Peu eer FA), 
G j#A,1) iAl 

By the union bound, 

RCM < PEL) + SD) PlEjJ+P[A+ DO PAL 6-13) 

@)ACL) iAl 
By the AEP, we know that 
P[E] <n) and P[F{] <6,(n). (3.14) 


For (i, j) C1, 1), X” (i, j) is independent of Y”; hence, Corollary 2.2 applies and 
Pie.) <2 rene for(i, j) # C1). (3.15) 
Similarly, for i # 1, X"(1, i) is independent of Z” and, by Corollary 2.2, 
P[Fi] < 2°77 2-9O) fori £1. (3.16) 
On substituting (3.14), (3.15), and (3.16) into (3.13), we obtain 


ERC] < den) + [28] 2" 2C- 4. paR XZD- (3.17) 
Hence, if we choose the rates R and Ra to satisfy 

R + Ra < I(X;Y)— 6(€) and Ra < I(X;Z)-— ê(e€), (3.18) 
then (3.17) implies that 


E[P(Cn)] < ôe(7n). (3.19) 


Next, we compute an upper bound for (1/n)E[L(C,,)]. Following the same steps as 
in (3.10), we obtain 


1 1 
—E[L(C,,)] = —1(M; Z"|C,,) 
n n 


1 1 1 
= —1(X"; Z"|C,,) — —H(Mal Cn) + —H(Ma\Z"MC,,). (3.20) 
n n n 
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We proceed to lower bound each of the three terms on the right-hand side of (3.20) 
separately. First, notice that, by construction, all randomly generated codes contain 
the same number of codewords; in addition, all these codewords are used with equal 
probability. Therefore, 


1 1 
HMalCn) = $ pc, (Crn) —HMalCn) 
Cn 


1 
= — log({2"*) 
n 
> Ra. (3.21) 
Next, by Fano’s inequality, 


1 1 
—A(X"|MZ"C,,) = X Pc, (Ch) -H(X |MZ"C,,) 
n z n 


1 1 
< X pc, Cn) G + PCn)— logt2"*1) 
Cn 


= ô(n) + 2 [R(Cn)](Ra + d(n)) 
= e(n), (3.22) 


where the last inequality follows from (3.19). Finally, note that C, —> X” —> Z” forms 
a Markov chain. Therefore, 


1 1 
“1(X"; Z"|C,) < —(X"; Z”) = I(X; Z), (3.23) 
n n 


since (X”, Z”) is i.i.d. according to pxz. On substituting (3.21), (3.22), and (3.23) 
into (3.20), we obtain 


1 
[ELC] < 1062) Ra + dco, 
n 
In particular, for the specific choice 
R < I(X;Y)— I(X;Z) and Rg =1(X;Z) — 6(e), (3.24) 


which is compatible with the conditions in (3.18), we obtain 


i juc» < 6(€) + d-(n). 


Finally, by applying the selection lemma to the random variable C,, and the functions 
P, and L, we conclude that there exists a specific code C, such that P.(C,) < 6.(n) 
and (1/n)L(C,,) < 5(€) + e(n); consequently, there exists a sequence of (2”*, n) codes 
{Cy }n>1 such that 


1 
lim P(C,)=0 and lim —L(C,) < d(e), 
n>oo n>on 
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Figure 3.10 Rate—equivocation regions R'(px) given by (3.25) and R™'(px) given in Theo- 
rem 3.2. 


and the rate—equivocation pair (R, R — ê(€)) is achievable. Since € can be chosen arbi- 
trarily small, since R satisfies the conditions in (3.24), and since I(X; Y) — I(X; Z) = 
I(X; Y|Z) for a DWTC, we conclude that 


0 < R < IX; YIZ) 


1 A š 
Ri(px) Ê Í (R, Ro: 0O<R<R 


\ ERTE (8.25) 
Remark 3.11. The fact that the channel is degraded has not been used to obtain (3.25); 
therefore, 


0< R < IX; Y) - I(X; Z) 
OSR <R 


{cr Re): \ eRe 


for any DMC (X, pyzx, Y, Z), not just for those of the form (x, Pz Py\x, V, Z). 


Step 2. Achieving the entire region °° 
As illustrated in Figure 3.10, the region R’(px) given in (3.25) is only a subset of the 
region R(px). The key idea to achieve the full region is to modify the (2”*, 2”*«, n) 
codes identified in Step 1 and to exploit part of the dummy message Mg to transmit 
additional information. However, we have to be careful because the analysis of the 
probability of error and leakage for the (2”*, 2”**, n) codes assumed that M and Mg 
were uniformly distributed; hence, we must check that our modifications do not affect 
the results. In the following paragraphs, we prove that this is indeed the case, but in the 
remainder of the book, we overlook this subtlety. 

Consider a (2”*, 2”*«, n) code C, identified in Step 1 with Ra = I(X; Z) — 6(e) and 
R < 1X; Y) — I(X; Z) and such that P.(C,,) < 6-(n) and (1/n)L(C,) < 6(€) + ôe(n) pro- 
vided that M and Mg are uniformly distributed. For R’ < Rg, note that [2”*] might not 
divide [2”*«], and, by Euclidean division, 


p27 Rey = ge | +r, 


for some integer q > 0 and some integer 0 < r < [2”*’]. As illustrated in Figure 3.11, 
for each m € |1, 2”?], we distribute the codewords x”(m, ma) with mg € |1, 2”®:], 
into sub-bins B,,(i) with i € [1,2"*], such that r of the sub-bins have size q + 1 
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Figure 3.11 Sub-binning of codewords in C,,. 


while the remaining [2”*%'] —r have size q. We relabel the codewords x”(m, m’, k) 
with m € [1,2”*], m € [1,2"*], and k € [1, q + 1] or k € [1, q]. The sub-binning 
is revealed to all parties, and we consider the following encoding/decoding procedure. 


e Encoder. Given m and m’, transmit a codeword x” (m, m’, k) € C,, chosen uniformly 
at random in B,,(m’). 
e Decoder. Given y”, use the decoding procedure of C,,. 


The sub-binning together with the encoding defines a (2”*", n) code C, for the DWTC, 
with R” = R + R’+d6(n). We assume that M and M’ are uniformly distributed but, 
because the sub-bins have different sizes, the distribution of the codewords is now slightly 
non-uniform. In fact, with our encoding scheme, some codewords x” are selected with 
probability pxrjc,(x”) = 1/([2”*][2”* ]q) while others are selected with probability 
pxc,(x") = 1/[f2"*][2"* (q + 1]. Nevertheless, the reader can check that 

5 Px e, œ”) — nes < d(n); 


x"ECy 


that is, the variational distance between the distribution of codewords pxnjc, and the 
uniform distribution over C,, vanishes for large enough. Consequently, the probability 
of decoding P,(C,,) satisfies 


PCr) < Pe(Cn J + 8(n)) < 8¢(n). 


In addition, the equivocation (1/n)E(C,) satisfies 


lre) = ‘(MM '/z"6,) 
n n 


W 


“H(MIz"<,) 


1 „1 i 
-H(MIĜ,) - -I(M; Z"Iĉ,). 


Since M is chosen uniformly at random, H(MIC,,) = H(MIC,,). Also, 1(M; ZAG) is 
a continuous function of px»; therefore, 


I(M; Z"|C,) < IM; Z"|C,) + (2) = L(C,) + 57). 
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Hence, 
Trs 1 1 
—EC,) > —H(MIC,) — —L(C,) — 6(n) 
n n n 
> R — (€) — ôe(n). 


Therefore, the rate-equivocation pair (R + R’, R — ô(€)) is achievable. Since R’ can 
be chosen as large as I(X; Z) — ô(€) and since € can be chosen arbitrarily small, we 
conclude that 


0 <S Re < R < I(X;Y) 


C DWTC 
0< Re < IX; YIZ) JER _ er 


Rowe (px) = fe, Re): 


Step 3. Convexity of the rate—equivocation region 
We show that R'’ is convex by proving that, for any distributions px, and px, on ¥, 
the convex hull of R°“(px,) UR (px,) is in RN. 

Let (Ri, Rei) E€ R™(px,) be a rate-equivocation pair satisfying the inequalities 
in (3.26) for some random variables X;, Y;, and Z; whose joint distribution is such 
that 


Yx, y, Z) EX XY XZ pxy,z,%.¥,2) = Px, (®)PyxOl*)pzyly). 


Similarly, let (Ro, Re2) € R(px,) be a rate—equivocation pair satisfying the inequal- 
ities in (3.26) for some random variables X2, Y2, and Z2 whose joint distribution is such 
that 


V(x, y,Z) Ex y xZ PXoYoZ(*, y,2Z) = Px,(*)Py|x(v|x)pziy (zly). 


Our objective is to show that, for any A € [0, 1], there exists a distribution px, on V 
such that 


(AR, + (1 — A)R, Rei + (1 — ARo) E R™ (px, ). 
We define a random variable Q € {1, 2} that is independent of all others such that 


Qê 1 with probability À, 
-~ |2 with probability 1 — À. 


By construction, Q > Xo —> Yo — Zo forms a Markov chain and the joint distribu- 
tion of Xg, Yo, and Zo satisfies 


Yx, y, Z) EX XY XZ PXxoYoZo æ, Y, Z) = PXQg) PY xO) pziyZly). 
We set X, £ XQ, Ya 4 Yo, and Z, £ Zo. Then, 
I(X,, Yn) = 1(XQ; Yo) 
> (XQ; YQIQ) 
= AMX; Y) + (0 — A(X; Y2) 
ZAR, +(1—-A)Ro, 
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and, similarly, 
IX; YalZ,) = 1(XQ; YolZa) 
> 1(XQ: YalZQQ) 
= AMX; ¥1|Z1) + (A — A(X; Y2|Z2) 
> ARa + (1 —A)Reo. 
Hence, for any A € [0, 1], there exists X, such that 
(AR, + (1 A)R, ARa + (1 — A) Rez) E RM (px,) SRW. 


Therefore, the convex hull of R™°(px,) U R(px,) is included in RO and RT 
is convex. 


Converse proof for the degraded wiretap channel 


Let (R, Re) € R be an achievable rate—equivocation pair and let € > 0. For n suffi- 
ciently large, there exists a (2”*, n) code C, such that 


1 1 
—H(MIC,) > R, —E(Cr) > Re — ô(€), Pe(Cn) < (€). 
R n 


In the remainder of this section, we drop the conditioning on C, to simplify the notation. 
By Fano’s inequality, we have 


1 1 
-H(M]|Y” Z") < —H(M|Y") < d(Pe(C,)) = ô(€). 
n n 
Therefore, 
H(M) 
1 
(M; Y") + -EMI 


(X"; Y”) + ê(€) 


Sle Ssle 3j= a3aj= 


H(Y”) — Tuo") + 8(€) 


n 


œ% | ee i-l 

= H(Y;|Y'") — —H(Y;|X;Y' ô 

p D (E0) — Fans) +36 
te i 

Sh I(X; Y; 7) + ale): 3.27 

2 (Xis VY) + 5) (3.27) 

where (a) follows from the data-processing inequality applied to the Markov chain 

M — X" — Y” and the bound (1/n)H(M|Y") < ô(€), and (b) follows because the 
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channel is memoryless. Similarly, we bound the equivocation rate Re as 


l 
Re < —H(MIZ") + 5(€) 
1 n n 1 nga 
= ZIM; Y"1Z") + -H(MIY"Z") + 8(€) 
(a) | 
< —I(M; Y"|2Z") + 6) 
n 
©% 1 
< —1X", Y"1Z") + 5(€) 
n 
l l 
= -H(Y"|Z") — -H(Y" |X" Z") + ê(€) 
n n 
1 n : 
= -X (H(Y; YZ") — H(¥1Y'"X"Z")) + 5) 
n 
i=1 
Ole ; 
<- X (A(Y;|¥''Z;) — H(Y;|¥'~'X;Z;)) + êle) 
n 
i=1 
1 n , 
= —S°1(X:Y1Z:¥') + 6), (3.28) 
n 


i=1 


where (a) follows from the bound (1/n)H(M|Y"Z") < 6(€), (b) follows from the data- 
processing inequality applied to the Markov chain M > X” — Y” — Z”, and (c) fol- 
lows from H(Y;|Y'~!X"Z") = H(Y;|Y'~!X;Z;) because the channel is memoryless and 
H(Y;|Y'-'Z”) < H(Y;|Y'~!Z;) since conditioning does not increase entropy. 

We now introduce a random variable Q, which is independent of all other random 
variables and uniformly distributed in [1, n|], so that we can rewrite (3.27) and (3.28) as 


RSJ Liy) + 6(€) = 1(XQ; Yeal¥Y27'Q) + 5(6), 


i=l 


(3.29) 
n 1 , 
Re < XL =I(Xis Yi|Zi¥') + 8(€) = I(Xg; YQlZQ¥2!Q) + ê(6). 
n 
i=l 
Finally, we define the random variables 
X£Xgo, YSVo, ZZgo, and we veo: (3.30) 


Note that U —> X — Y > Z forms a Markov chain and that the transition probabilities 
pzy and pyjx are the same as those of the original DWTC. On substituting (3.30) 
into (3.29), we obtain the conditions 


e < R < KX; YIU) + de), 


O< R 
0 < Re < I(X; YIZU) + 4(€). 
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Figure 3.12 Communication over a broadcast channel with confidential messages. Mo represents 
a common message for both Bob and Eve. M, represents an individual message for Bob, which 
should be kept secret from Eve. R represents local randomness used in Alice’s encoder. 


Since U —> X — Y > Z forms a Markov chain, I(X; YIU) < I(X; Y) and I(X; Y|ZU) < 
I(X; Y| Z); therefore, 


e S R < IX; Y) + ê(€), 


O<R 
0 < Re < (X; Y|Z) + 4(€). 


Since € can be chosen arbitrarily small, we conclude that 


0 < Re < R < K(X; Y) 
DWTC q R R : Z DWTC i 
R cU{i RO OSR S IXYIZ \ Ue (px) 
X 


Px 


Broadcast channel with confidential messages 


The DWTC model is not entirely satisfactory because it explicitly puts the eavesdropper 
at a disadvantage. Although the achievability proof of Section 3.4.1 does not exploit the 
degraded nature of the channel, the converse proof does and it is not obvious whether the 
specific stochastic encoding used in Section 3.4.1 is still optimal for non-degraded chan- 
nels. In addition, it is useful to characterize the trade-off between reliability and security 
more precisely; in particular, we would like to investigate whether one could transmit 
reliable messages to the eavesdropper and conceal other messages simultaneously. 

These issues are resolved by analyzing a more general model than the DWTC. As 
illustrated in Figure 3.12, we consider a broadcast channel with two receivers for which 
a sender attempts to send two messages simultaneously: a common message, which is 
intended for both receivers, and an individual secret message, which is intended for only 
one receiver, treating the other receiver as an eavesdropper. This channel model was 
termed the broadcast channel with confidential messages (BCC for short) by Csiszár 
and Körner. In the absence of a common message, the channel is called a wiretap channel 
(WTC for short). 

Formally, a discrete memoryless BCC (X, pyzıx, Y, Z) consists of a finite input 
alphabet X, two finite output alphabets Y and Z, and transition probabilities pyz;x such 
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that 


Vn >1 YQ”, y", z") e æ" x Y" xz" 
Pyr zxr y”, z” |x”) = || pvzx0; zilx;). 
i=l 
The marginal probabilities pyx and pz|x define two DMCs. By convention, the DMC 
(X, py|x, Y) is the main channel and the DMC (X, pzıx, Z) is the eavesdropper’s 
channel. 


Definition 3.6. A (2%, rR, n) code C, for the BCC consists of 


e a common message set My = [1,27] and an individual message set Mı = 
[1 f Fk ; 

a source of local randomness (R, Pr); 

an encoding function f : Mo x Mı x R— X”, which maps a message pair 
(mo, mı) and a realization of local randomness r to a codeword x" ; 

a decoding function g : Y” — (Mo x Mı) U {?}, which maps each channel obser- 
vation y” to a message pair (mo, m1) € Mo x M, or an error message {?}; 

a decoding function h : Z” — Mo U {?}, which maps each channel observation z” 


to a message my E Mo or an error message {?}. 


The (27%, 2”R1 n) code C, is known to Alice, Bob, and Eve, and we assume that 
messages Mo and M; are chosen uniformly at random. The reliability performance of 
the code C,, is measured in terms of its average probability of error 


PCy) © P |o, Mi) # (Mo, Mi) or Mo # MoC, ], 
while its secrecy performance is measured in terms of the equivocation 
E(C,) = H(M,|Z"C,). 


Definition 3.7. A rate tuple (Ro, Ri, Re) is achievable for the BCC if there exists a 
sequence of Osea QP n) codes {Cy }n>1 Such that 


lim P.(C,,) = 0 (reliability condition), (3.31) 
noo 

1 
lim —E(C,) > Re (weak secrecy condition). (3.32) 
n>o M 


The rate-equivocation region of the BCC is 
Re £ cl({(Ro, Ri, Re) : (Ro, Ri, Re) is achievable}). 
The secrecy-capacity region is 


C° £ cl({(Ro, Ri) : (Ro, Ri, R1) € RY), 
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the rate—equivocation region of the wiretap channel is 
RS cl({(Ri, Re) : (0, Ri, Re) € R*}), 
and the secrecy capacity is 


Cw" £ sup{R : (0, R, R) € R9}. 
R 


The regions C®°° and R™ are specializations of the region R®* that highlight different 
characteristics of a BCC. The region C® captures the fundamental trade-off between 
reliable communication with both Bob and Eve and communication in full secrecy 
with Bob, while the region ™ is just the generalization of the rate-equivocation region 
RTS defined in Section 3.4 for DWTCs. By replacing the weak secrecy condition (3.32) 
by the stronger requirement lim, ,.,(E(C,) — nRe) > 0, we obtain the strong rate- 
equivocation region R®°°, the strong secrecy-capacity region C®°, and strong secrecy 
capacity CW. 


Theorem 3.3 (Csiszár and Körner). Consider a BCC (4, pyzx, Y, Z). For any 
joint distribution puvx on U x V x X that factorizes as pu pvju pxıv., define the set 
Repu pvu Pxiv) as 


R?”®(pupvjupxiv) 


OCR <S Rı 

0 < Re < IV; YIU — ICV; ZIU) 

0 < Ro < min (I(U; Y), I(U; Z)) ? 
0 < Ri + Ro < ICV; YIU) + mind (U; Y), ICU; Z)) 


= 4 (Ro, Ri, Re): 


where the joint distribution of U, V, X, Y, and Z factorizes as puPv\uPx\v Pyz|x. Then, 
the rate—equivocation region of the BCC is the convex region 


R= |] R™(pupviupxiv)- 


PUPV|UPX|V 


In addition, the cardinality of the sets U and V can be limited to 
Ml < |X| +3 and |Vi<|¥P+4|a| +3. 


The typical shape of R°(pupvjupx\v) is illustrated in Figure 3.13. Note that the 
upper bound for the equivocation Re is similar to that obtained in Theorem 3.2 for 
the DWTC and involves the difference between two information rates. However, the 
expression includes the auxiliary random variables U and V. Exactly how and why U 
and V appear in the expressions will become clear when we discuss the details of the 
proof in Section 3.5.2 and Section 3.5.3; at this point, suffice it to say that U, which 
appears as a conditioning random variable, represents the common message decodable 
by both the legitimate receiver and the eavesdropper while V accounts for additional 
randomization in the encoder. 
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I(V: ¥|U) 


min(I(U; Y), I(U, Z)) AÁ 


Ro 


Figure 3.13 Typical shape of R®°°(pu pvu pxiv)- 


Theorem 3.3 leads to the following characterizations of C®°°, RT, and C". 


Corollary 3.2. Consider a BCC (X, pyzx, Y, Z). For any joint distribution puvx 
on U xV x X that factorizes as pupyjupxiv, define the set C®°(pupvjupxiv) 
as 


0< Rı < I(V; YIU) — I(V; ZJU 
c™(pupvupav) Ê { (Ro, Ro): |< IV; YIU) = K ae 


0 < Ro < min(I(U; Y), I(U; Z)) 


where the joint distribution of U, V, X, Y, and Z factorizes as pu pvu Pxiv PYz|x. Then, 
the secrecy-capacity region of the BCC is the convex set 


cr = |] C™(pupviupxiv)- 


PuPv\uPX\|v 


Corollary 3.3. Consider a WTC (X, pyz|x, Y, Z). For any joint distribution puvx 
on U x V x X that factorizes as pu pvjupxv, define the set R™ (pupviupxiv) 
as 


0< R < R< IVY 
R" (pupviupxv) = fr, Re): vn \ 


0 < Re < UV; YU) — ICV; ZIU) 
Then, the rate—equivocation region of the WTC is the convex set 
RY = U R" (puPviuPxiv)- 
PUPV|UPX|V 
In addition, the distributions pupyjupxi|v can be limited to those such that 


ICU; Y) < I(U; Z). 


Remark 3.12. The additional condition I(U; Y) < I(U; Z) in Corollary 3.3 may seem 
unnecessary since R™ is already completely characterized by taking the union over 
all distributions factorizing as pupvjupxv; however, if we were to evaluate R" 
numerically, we could speed up computations tremendously by focusing on the subset of 
distributions satisfying the condition I(U; Y) < I(U; Z). 
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Proof of Corollary 3.3. Substituting Ro = 0 into Theorem 3.3 shows that 


Pa 0S Re < Ri < I(V; YJU) + min(I(U; Y), IU; Z) 
R"= | fer Ro: 0 < Re < ICV; YIU) — ICV; ZIU) i 


PuPviuPxiv 
We need to show that, without loss of generality, the upper bound I(V; Y|U) + 
min(I(U; Y), ICU; Z)) can be replaced by I(V; Y). 
This is always true if pupyjupxiv Pyz|x is such that I(U; Y) < I(U; Z) because, in 
this case, 
I(V; Y|U) + min(I(U; Y), ICU; Z)) = I(UV; Y) 
= I(V; Y) + I(U; Y|V) 
=I(V;Y), 
where the last equality follows from I(U; Y|V) = 0 since U —> V — Y forms a Markov 
chain. In particular, note that the distributions pupvjupx|vPyz|x for which U is a 
constant satisfy I(U; Y) = 1(U; Z) = 0 and thus I(U; Y) < I(U; Z). 
Consider now a distribution pu pyjupPx\v Pyz|x such that I(U; Y) > I(U; Z). Then, 
I(V; Y|U) + min(I(U; Y), 1(U; Z)) = I(V; YIU) + 1(U; Z) 
< I(V; YIU) + 1(U; Y) 
=1(V;Y), 
and, similarly, 
I(V; YIU) — 1(V; ZIU) = XC puwav; YIU = u) — KV; ZIU = u)) 
ucU 


< max(I(V; YIU = u) — I(V; ZIU = u)). 
ue 


Therefore, the rates (R,, Re) obtained with a distribution pu pyjupx|vPyz|x such 
that I(U; Y) > I(U; Z) are upper bounded by the rates obtained with a distribution 
PuPv\uPx\vPyz\|x in which U is a constant. Therefore, without loss of general- 
ity, we can obtain the entire region R™ by restricting the union to the distributions 
PuPviuPx\v Pyz|x that satisfy I(U; Y) < I(U; Z) and we can replace the upper bound 
ICV; YIU) + min(I(U; Y), I(U; Z)) by ICV; Y). 


Corollary 3.4. The secrecy capacity of a WTC (X, pyzx, Y, Z) is 
CM" = max(I(V; Y) — I(V; Z)). 
PVX 
Proof. By Corollary 3.3, 


C™= max (I(V;YIU) — I(V; ZIW), 


PUuUPV|UPX|V 
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which can be expanded as 


Gis 


S 


max (I(V; YIU) — I(V; ZIU) 


PUPV|UPX|V 


Pu Pvju Pxiv 
l l ucU 


max > pulu) (I(V; Y|U = u) — I(V; ZIU = ») 


= max(I(V; Y) — I(V; Z)). 
PVX 


Channel comparison 


Although Corollary 3.4 provides an exact characterization of the secrecy capacity, the 
auxiliary random variable V makes the evaluation of CY" arduous and prevents us from 
developing much intuition about the possibility of secure communication. Nevertheless, 
it is possible to establish the following general result. 


Lemma 3.4 (Liang et al.). The secrecy capacity of a WTC (X, pyzix, Y, Z) depends 
on the transition probabilities pyz|x only through the marginal transition probabilities 
PY|x and pz\x. 


Proof. Consider a code C,, designed for a WTC (4, pyzix, Y, Z). By definition, the 
average error probability P.(C,) = P[M # M | C,„] is determined by the distribution 
PMmxy» and hence depends on the transition probabilities py|x but not on the transi- 
tion probabilities pz;x. Similarly, by definition, the equivocation E(C,,) = H(M|Z"C,,) 
is determined by the distribution pmx»z» and hence depends on the transition proba- 
bilities pz);x but not on the transition probabilities py;x. Consequently, whether a rate 
is achievable or not depends only on the marginal transition probabilities py;x and 


PZx. 


Intuitively, Lemma 3.4 states that we can understand whether secure communication 
is possible or not if we can somehow compare the main channel (Æ, pyx, Y) with the 
eavesdropper’s channel (X, pz|x, Z). 

We have already studied a specific relation between the main channel and the eaves- 
dropper’s channel when we analyzed the DWTC in Section 3.4. In fact, the transition 
probabilities pyz|x factorize as pzjypy\x for a DWTC. We formalize this relation 
between the eavesdropper’s channel and the main channel by introducing the notion of 
physically degraded channels. 


Definition 3.8 (Physically degraded channel). We say that (X, pz\x, Z) is physically 
degraded with respect to (x , PYIX, y) if 


V(x, y, EX XY xZ pyzx(y,2|x) = pzyv(zly) py xOlx) 


for some transition probabilities pz,y. In other words, (xX, PZIx, Z) is physically 
degraded with respect to (xX, PY|x; y) if X = Y = Z forms a Markov chain. 
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Figure 3.14 Example of a physically degraded channel. 
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Figure 3.15 Example of a non-physically degraded channel. 


Therefore, a DWTC is simply a WTC in which the eavesdropper’s channel is physically 
degraded with respect to the main channel (see (3.2)). 


Example 3.6. Consider the concatenated channels illustrated in Figure 3.14, which 
are such that (X, pyx, Y) is a binary erasure channel BEC(e) and (4, pz)x, Z) is 
a binary symmetric channel BSC(e€/2). By construction, (x » PZIX> Z) is physically 
degraded with respect to (x , PYIX; y). 


Since physical degradedness is a stringent constraint, it is useful to consider weaker 
relations between the channels (x , PYIX, y) and (x , PZ|X; Z) obtained from the 
marginals of a broadcast channel (X, pyzix, Y, Z). 


Definition 3.9 (Stochastically degraded channel). We say that (x »Pzx,Z ) is stochas- 
tically degraded with respect to (x PY|X; y) if there exists a channel (y, Pz; Z) such 
that 


V(x,z)EX x Z pzxzlļx)= X pzivly)pyixOlx). 
yey 


In other words, (xX, PzIx, Z) has the same marginal as a channel that is physically 
degraded with respect to (æ , PYIX, y). 


Example 3.7. Consider the broadcast channel illustrated in Figure 3.15 in which 
(x, PY|x; y) is a binary erasure channel BEC(e) and (x, Pz\x; Z) is a binary symmet- 
ric channel BSC(p) with p € [0, 5]. If0 < € < 2p, then (x, Pz\x; Z) is stochastically 
degraded with respect to (x , PYIX; y). This fact is the consequence of a more general 
result that we derive in Proposition 6.4. 
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Note that there is no real difference between stochastically degraded channels and 
physically degraded channels because, by Lemma 3.4, the secrecy capacity depends 
only on the marginal transition probabilities of the WTC. For stochastically degraded 
channels, it is possible to relax the assumptions made about the eavesdropper’s channel 
as follows. 


Definition 3.10 (Class of stochastically degraded channels). A class of channels is said 
to be stochastically degraded with respect to a worst channel (xX, Plx: Yo) if and only 
if every channel (x , PYIX; yX) in the class is stochastically degraded with respect to the 
worst channel. 


Proposition 3.3 (Robustness of worst-case design). Given a class of stochastically 
degraded eavesdropper 5 channels, a wiretap code ensuring equivocation R, for the 
worst channel guarantees at least the same equivocation for any eavesdropper s channel 
in the class. 


Proof. The result is a direct consequence of the data-processing inequality. Let M be 
the message sent, let Yj denote the output of the worst channel, and let Y” denote the 
output of the eavesdropper’s channel. Note that Y” is statistically indistinguishable from 
the output of a physically degraded channel for which M — Yj — Y”; therefore, the 
data-processing inequality ensures that 


1 1 
-H(M]Y”) > —H(MIY7) > Re. 
n n 


Despite is simplicity, Proposition 3.3 has useful applications, as illustrated by the 
following examples. 


Example 3.8. Consider a binary erasure wiretap channel BEC(e) as in Figure 3.3, for 
which only a lower bound e* of the eavesdropper’s erasure probability is known. It is 
easy to verify that the set of erasure channels with erasure probability € > €* is a class 
of stochastically degraded channels, for which the worst channel is the one with erasure 
probability €*. Proposition 3.3 ensures that a wiretap code designed for the worst channel 
guarantees secrecy no matter what the actual erasure probability is. 


Example 3.9. Another application of Proposition 3.3 is the situation in which we do not 
know how to design a code for a specific channel. For instance, we will see in Chapter 6 
that designing wiretap codes for channels other than erasure channels is challenging. If 
we can show that the channel is stochastically degraded with respect to another “simpler” 
channel C*, we can design a wiretap code for C*. For instance, consider a wiretap channel 
with a noiseless main channel and a binary symmetric channel BSC(p) (p < 5) as the 
eavesdropper’s channel. The binary symmetric channel is degraded with respect to a 
binary erasure channel BEC(2p), hence any wiretap code designed to operate for the 
latter channel also provides secrecy for the former. 
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Definition 3.11 (Noisier channel). (4, pzjx, Z) is noisier than (X, pyx, Y) if for 
every random variable V such that V > X — YZ we have I(V; Y) 2 I(V; Z). 


Recall from Corollary 3.4 that Cf!" = max,,, (I(V; Y) — I(V; Z)); therefore, Cs = 0 
if and only if I(V; Y) < ICV; Z) for all Markov chains V —> X — YZ, which is exactly 
the definition of the eavesdropper’s channel being noisier than the main channel. This 
result is summarized in the following proposition. 


Proposition 3.4. The secrecy capacity ofa WTC (X, pyzx, Y, Z) is zero if and only if 
the main channel (x, PY|Xx; y) is noisier than the eavesdropper ’s channel (xX, PzIx, Z). 


Because the definition of “being noisier” involves an auxiliary random variable V, it 
may be much harder to verify that a channel is noisier than to verify that it is physically or 
stochastically degraded. Fortunately, the property “being noisier” admits the following 
characterization, which is sometimes simpler to check. 


Proposition 3.5 (Van Dijk). (x, PzIx; Z) is noisier than (xX; PY|x> y) if and only if 
I(X; Y) — 1(X; Z) is a concave function of the input probability distribution px. 


Proof. Suppose V —> X — Y forms a Markov chain. Using the conditional indepen- 
dence of V and Y given X we have 


ICV; Y) = I(VX; Y) — I(X; Y|V) 

= I(X; Y) + I(V; YIX) — 1; Y|V) 

= I(X; Y) — 1(X; Y|V). 
The same equality also holds if we replace Y by Z; therefore, 

ICV; 2) < IV; Y) <> KX; Z) — IX; ZIV < IX; Y) — KX YIV) 
<> I(X; YIV) — I(X; ZIV) < 1X; Y) — I(X; Z). (3.33) 
For any v € V, we define the random variable X,, whose distribution satisfies 
Vx EX px,(x) = pxiv(x|v). 

Since V —> X — YZ forms a Markov chain, note that 


1X; YV) = So pv YIV = v) = > pvo; Y), 


veV veV 
IX ZIV) = So pv KX; ZIV = v) = $. pv@yX; Z); 
veV veV 


therefore, we can rewrite (3.33) as 
IV; Z) < UV; Y) > 5 Pv(v) UX; Y) — IXs; Z)) < UX; Y) — UX; Z). 
veV 
Noting that 


Vx EX px(x) = So pxyv(xlv)pv(v) = X px, ()pv), 
veV veV 
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and treating I(X; Y) — I(X; Z) as a function of the input distribution px, the condition 
rey PVW) AX; Y) — (Xp; Z)) < 1X; Y) — 1(X; Z) means that I(X; Y) — 1(X; Z) is a 
concave function of the input distribution px. 


Example 3.10. Let (X, pyx, Y) be a BEC(e) and let (X, pzıx, Z) be a BSC(p) with 
p € [0, 4] as in Figure 3.15. We show that (X, pz\x, Z) is noisier than (X, pyx, Y) 
if and only if 

0< € <4p(1— p). 


Proof. The result holds trivially ife = 1 or p = 5. Hence, we assume € > O and p < L, 
Let X ~ B(q) for some q € [0, 1]. Then, I(X; Y) — I(X; Z) is a differentiable function 
of q given by 


f iq (L—)Hb(q) — H(p +4(1 — 2p)) + H(p). 


By Proposition 3.5, it suffices to determine conditions for f to be concave. After some 
algebra, one obtains 


a7 2 0 -6)p( - p) 
`} (q) <08 -— 
ae (1) <0 & —eq° + €q Uaa 
The quadratic polynomial in q 
a Gady 
eq +€q ape — 2p 


is negative if and only if its discriminant A is negative; one can check that 


A <04 e< 4p. — p). 


Notice that, for 2p < e < 4p(1 — p), (X, PzIx, Z) is noisier than but not stochasti- 
cally degraded with respect to (X, pyx, Y). 


Definition 3.12 (Less capable channel). (4, pzjx,Z) is less capable than 
(X, PYIX: y) if for every input X we have I(X; Y) > I(X; Z). 


Example 3.11. Let (X, pyx, Y) be a BEC(e) and let (X, pzjx, Z) be a BSC(p) 
with p € [0, 5] as in Figure 3.15. We show that (X, pz|x,Z) is less capable than 
(X, pyx, Y) if 


0 < € < H(p). 


Proof. The result holds trivially if p = 1, hence without loss of generality we assume 
p< 1, Assume € < H(p), let X ~ B(q) for some q € [0, 1], and let f be defined as 
in Example 3.10. Since f (q) = f(1 — q), the function f is symmetric around q = 5. 
In addition, because (0) = 0 and fG) = H(p) — € = 0, we prove that f (q) > 0 for 
q € [0, 1] by showing that d f /dq(0) > 0 and that df /dq changes sign at most once in 
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the interval [0, 1]. After some algebra, one obtains 
df l-@ L=p—g(l—2p) 
—(q)20e( - otog(—*) =(1 - 2p)iog( >0 
dq q p+q4(l-— 2p) 
S alp +41- 2p) (1 -—q4)-— (1 -— p)-4(1 — 2p))q 2 
 —(a— 1)(1 — 2p)q? + q(a(1 — 3p) — (1 — p)) + ap > 


0 
0, 


with 


The quadratic polynomial in q 


P(q) ê —(a — 1) — 2p)q’ + q(a(1 — 3p) — (1 — p)) + ap 


is such that P(0) = ap > 0; therefore, d f /dq (0) > 0. In addition, P (q) has at most one 
root in the interval [0, 5]; therefore, d f /dq changes sign at most once, which establishes 
the result. 


Notice that, for 4p(1 — p) < € < Hy(p), (X, pzıx, Z) is less capable than but not 
noisier than (¥, pyx, Y). 


The following proposition shows how Definitions 3.8—3.12 relate to one another. 


Proposition 3.6. Let (xX, PY|X; y) and (xX, PzI\x; Z) be two DMCs and consider the 
following statements: 


(1) (x, PZIx; Z) is physically degraded w.r.t. (x, PY|x; y) 
(2) (x, PZIx; Z) is stochastically degraded w.r.t. (x, PY\x; y) 
(3) (x, PZIx; Z) is noisier than (x, PY|x; y) 

(4) (x, PZIx; Z) is less capable than (x, PY\x; y). 


Then, 
0) => 0) > @)=> (4. 
Examples 3.6—3.11 show that the implications of Proposition 3.6 are strict. 


Corollary 3.5. The secrecy capacity of a WTC (X, pyzx, Y, Z) in which the eaves- 
dropper s channel is less capable than the main channel is 


CS" = max(I(X; Y) — I(X; Z)). 
PX 


Proof. The achievability of rates below C{" follows directly from Corollary 3.3 for 
general non-degraded channels on choosing V = X. To obtain the converse, note that 


I(V; Y) = I(VX: Y) — (X: Y|V) 
= (X: Y) + IV: YIX) — (X: YIV) 
= I(X;Y) - (X; YIV) 
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since V + X — Y forms a Markov chain. Similarly, 
WV; Z) = I(X; Z) — 1(X; ZIV). 
Therefore, 
I(V; Y) — I(V; Z) = I(X; Y) — I(X; Z) + 1K; ZIV) — K(X; Y|V). 
Now, the difference I(X; Z|V) — I(X; Y|V) can be upper bounded as 
KX; ZIV) — 1X; YIV) < maz AX; ZIV) — (X; Y|V)) 


= (= Pv(v) U(X; ZIV = v) - KX; YIV = ») 


veV 
= max (1(X; Z) — I(X; Y)) 
PX 
< 0, 


where the last inequality follows from the assumption that the eavesdropper’s channel is 
less capable than the main channel. Consequently, 


ICV; Y) — 1(V; Z) < KX; Y) — (X; Z), 


which proves that the choice V = X in Corollary 3.3 is optimal. 


From a practical standpoint, the fact that the choice V = X is optimum means that 
achieving the secrecy capacity does not require additional randomization at the encoder, 
which is convenient because we do not have an explicit characterization of this ran- 
domization. Because of Proposition 3.6, the expression for the secrecy capacity given 
in Corollary 3.5 also holds if the eavesdropper’s channel is noisier than, stochastically 
degraded with respect to, or physically degraded with respect to the main channel. In 
retrospect, it might be surprising that this expression remains the same on weakening 
the advantage of the main channel over the eavesdropper’s channel, but this suggests that 
additional randomization in the encoder is unnecessary for a large class of channels. 

If the eavesdropper’s channel is noisier than the main channel and both channels are 
weakly symmetric, then the secrecy capacity is the difference between the main channel 
capacity and the eavesdropper’s channel capacity. This result generalizes Proposition 3.2, 
which was established for DWTCs. 


Proposition 3.7 (Van Dijk). Fora WTC (X, pyzx, Y, Z), ifthe eavesdropper 5 channel 
is noisier than the main channel and both channels are weakly symmetric, then 


Cl = Cm = Ce, 


S 


where Cm is the channel capacity of the main channel and C, that of the eavesdropper s 
channel. 


Proof. From Corollary 3.5, C{” = max,, (I(X; Y) — I(X; Z)) and, from Proposition 3.5, 
I(X; Y) — 1(X; Z) is a concave function of px; therefore, we can reiterate the proof 


90 


3.5.2 


Secrecy capacity 


of Proposition 3.2. We note that the result also holds if the channels are not weakly 
symmetric, provided that I(X;Y) and I(X; Z) are maximized by the same input 
distribution px. 


As illustrated by Example 3.12 below, Proposition 3.7 does not hold if we replace 
“noisier” by “less capable.” 


Example 3.12. Let (X, pyix, Y) be a BEC(e) and let (X, pzıx, Z) be a BSC(p) as 
in Figure 3.15. For € = H(p), we know from Example 3.11 that (X, pz;x, Z) is less 
capable than but not noisier than (X, pyx, Y). Notice that Cm = 1 — € = 1 — H(p) = 
Ce; therefore, Cm — Ce = 0 bits. On the other hand, one can check numerically that 
CX" ~ 0,026 bits. 


To conclude this section, we illustrate the usefulness of Corollary 3.5 by computing 
the secrecy capacities of several wiretap channels. 


Example 3.13. Consider a broadcast channel in which the main channel is a BSC(p) and 
the eavesdropper’s channel is a BSC(r) with r > p. Then, the eavesdropper’s channel 
is stochastically degraded with respect to the main channel and the secrecy capacity is 
Co" = Hy(r) — H(p). 


Example 3.14. Consider a broadcast channel in which the main channel is a BEC(€,) 
and the eavesdropper’s channel is a BEC(€2) with €2 > €;. Although the two channels 
could be correlated erasure channels, the secrecy capacity is C{" = e2 — €; because the 
BEC(e2) is stochastically degraded with respect to the BEC(€,). 


Achievability proof for the broadcast channel with confidential messages 


The idea of the proof is similar to Section 3.4.1. The presence of a common message 
and absence of physical degradedness require the introduction of two auxiliary random 
variables and make the proof slightly more technical, but the intuition developed earlier 
still applies. We carry out the proof in four steps. 


1. Fora fixed distribution pux onU x æ, some arbitrary € > 0, and rates (Ro, R1) such 
that 


Ro < min(I(U; Y), I(U; Z), and Ry, < I(X; YIU) — I(X; ZIU), 


we use a random-coding argument to show the existence of a sequence of 
(2”2o, 2"%! n) codes {Cn}n>1 such that 


1 
lim P.(C,)=0, lim H(R|Z"M,MoC,)=0, and lim —L(C,) < d(e). 
noo noo n>oon 
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The proof combines the superposition coding technique introduced in Section 2.3.3 
for broadcast channels with the binning structure of wiretap codes identified in 
Section 3.4.1. This shows the existence of wiretap codes with rate close to full 
secrecy and guarantees that Rı (pux) E RS, where 


Ri(pux) = {Ro Ri, Re): 0 < Ro < min(I(U; Y), 1(U; Z)) } | 


0 S Re < Ri < U(X; YIU) — 1X; ZIU) 


2. With a minor modification of the codes identified in Step 1, which can be thought of 
as an outer code construction, we show that R2(pux) © R’, where 


0S R< Rı 

Re < K(X; YIU) — K(X; ZIU) 

Ri + Ro < UX; YIU) + mini(U; Y), IU; Z)) 
0 < Ro < min (U; Y), IU; Z)) 


Ra(pux) = 4 (Ro, Ri, Re): 


3. We show that the region Urix Ra(pux) is convex. 

4. We introduce a “prefix channel” (v, PX\V; Xx) before the BCC (4, pyz;x, VY, Z) to 
create a BCC (V, pyziv, Y, Z). This prefix channel introduces the auxiliary random 
variable V and accounts for more sophisticated encoders than those used in Step 1. 
This shows the achievability of the entire rate-equivocation region R°°. 


Step 1. Random coding argument 
We prove the existence of a sequence of (27, Phi n) codes {Ca}n>1 such that 


lim P(C) = 0, lim H(R|Z”MoM:C,) = 0, (3.34) 
noo noo 
1 
lim —L(C,) < 8(€). (3.35) 
n>oon 


The proof combines the technique of superposition coding introduced in Section 2.3.3 
with the technique of wiretap coding discussed in Section 3.4.1. As in Section 3.4.1, we 
start by combining the two constraints in (3.34) into a single reliability constraint by 


e introducing a virtual receiver, hereafter named Charlie, who observes the same channel 
output Z” as Eve in the original BCC, but who also has access to M; and Mg through 
an error-free side channel; 

e using a message Ma in place of the source of local randomness (R, pr) and requiring 
Mag to be decoded by both Bob and Charlie. 


A code for this “enhanced” BCC is then defined as follows. 


Definition 3.13. A (27%, 2k gna | n) code Cn for the enhanced channel consists of 
the following: 


e three message sets, My = [1, 2”žo]], Mı = [1 are and Mg = l, 2rRal 
e an encoding function f : Mo x Mı x Ma —> X”, which maps each message triple 
(mo, mı, Mma) to a codeword x"; 
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e a decoding function g : Y” —> (Mo x Mı x Ma) U {2}, which maps each channel 
observation y” to a message triple (mo, M1, Ma) € Mo x My, x Ma or an error 
message ?; 

e a decoding function h : Z” — Mo U {?}, which maps each channel observation z” 
to a message Mo € Mo or an error message ?; 

e a decoding function k : Z" x Mo x Mı > MaU {?}, which maps each message 
pair (mo, mı) and the corresponding channel observation z” to a message mq E€ Ma 
or an error message ?. 


We assume that messages My, Mj, and Mg are uniformly distributed. The reliability 
performance of a (2”%, 2": 2"%« n) code C, is measured in terms of its average 
probability of error 


P.(C,) P|, Mı, Ma) # (Mo, Mi, Ma) or Mo # Mo or Ma # Milca], 
while its secrecy performance is still measured in terms of the leakage 
L(C,) = (My; Z" |Cn). 


Because Ma is a dummy message that corresponds to a specific choice for the source of 
local randomness (R, pr), a (2%, 2"*', 2”*«, n) code C, for the enhanced BCC is also 
a (270, oe. n) code for the original BCC. By construction, the probability of error 
over the original BCC is at most P.(C,,), since 


P| (Mo, Mi) # (Mo, Mi) or Mo # Mo|Cn] < RC). 
In addition, using Fano’s inequality, we have 
1 
—H(Ma|Z"” MoMC,,) < 5(P2(C,,)). 
n 


Therefore, if lim„—>oo P.(C,) = 0, the constraints (3.34) are automatically satisfied. 
We begin by choosing a joint distribution pux on U x ¥ and we assume, without 
loss of generality, that 


I(X; YIU) — I(X; ZIU) > 0 and I(X; ZIU) > 0, 


otherwise the result follows from the channel coding theorem as discussed in 
Remark 3.10. Let 0 < € < wuxyz, where 


A 
HUXYZ = Puxyz(u, x, y,Z), 


min 
(u,x,V,Z)EUXXXYxZ 


and let n € N*. Let Ro > 0, Rı > 0, and Ra > 0 be rates to be specified later. We 
construct a (2”%», 2"%1,2”% n) code for the enhanced BCC by combining superposition 
coding and binning as follows. 


e Codebook construction. Construct codewords u"(mo) for mo € |1, 2”®]], by gener- 
ating symbols u;(mo) with i € [1, n] and mo € [1, 2”**] independently according to 
pu. Then, for every w"(mo), generate codewords x” (mo, mı, ma) for mı € [1,2"*' | 
and m4 € i, 2"%]| by generating symbols x;(mo, mı, ma) with i € |1, n], mı € 
[1,2”*'], and mg € [1, 2"*] independently at random according to px|U=u,(mo)- 
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e Alices encoder f. Given (mo, mı, ma), transmit x” (mo, mı, ma). 

e Bobs decoder g. Given y”, output (mro, 7, Ma) if it is the unique triple such that 
(u" (mo), x” (mo, m1, Ma), y”) € T"(UXY). Otherwise, output an error ?. 

e Eves decoder h. Given z”, output mo if it is the unique message such that 
(u" (mo), z”) € T? (UZ). Otherwise, output an error ?. 

e Charlies decoder k. Given z”, mo, and m1, output Ma if it is the unique message such 
that (u” (mo), x"(mo, mı, Ma), 2") € T? (UXZ). Otherwise, output an error ?. 


The random variable that represents the randomly generated codebook C,, is denoted by 
C,,. By combining the analysis of superposition coding in Section 2.3.3 with the analysis 
of wiretap coding in Section 3.4.1, we can prove that, if 


Ro < min(I(U; Y), I(U; Z)) — 6(e), 
Rı + Ra < U(X; YIU) — ô(€), (3.36) 
Ra < I(X; Z|U) — ô(€), 


then 


O[Pe(Cn)] < de(n). (3.37) 


Next, we compute an upper bound for E[(1/n)L(C,,)]. Note that 


1 1 
[uco] = (M1; Z"\Cn) 
n n 


IN 


Sle Sle ste sleP ste 


(My; Z"”Mo|Cy) 
1 
(M,X"; Z”Mo|C,,) — —I(X"; Z"”Mo|MiC,,) 
n 
1 1 
(X"; Z”Moļ| Ca) + —I(My; Z”Mo|X"C,,) — —1(X"; Z”Mo|M: Cp) 
n n 


1 1 
(X"; Mo|Cn) + IX"; Z"|MoCn) — —1(X"; Z"Mo|Mi Cn) 
n n 


1 1 
HI(Mo|Cu) + ZIX"; Z™|MoCn) — H(X" |M: Cn) 


1 
+ -H(X”|Z”"MoM:C,) 
n 


o 1 1 1 

= —1(X"; Z”|MoC,,) — ~—H(Ma|C,,) + —H(X"|Z"”MoM,1C,), (3.38) 
n n n 

where (a) follows from I(M,;Z”Mo|X”C,,) = 0, (b) follows from I(X”; Mo|C,) = 
H(Mo|C,,), and (c) follows from H(X”"|M,C,,) = H(Mo|C,,) + H(Mal|C,). We now 
bound each of the terms on the right-hand side of (3.38) individually. First, notice that 
the code construction ensures that 


1 1 
—H(Ma|C,) = 5 Pe, (Ca) —H(MalC,,) > Ra. (3.39) 
n z n 
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Next, using Fano’s inequality, 


1 1 
-HX"|Z"MoM:C,) = $ ` pc, (Cn) U(X" |Z" MoM Cn) 
Cn 


1 1 
<S X po, (Ca) G + Pe(Cn)— lo 2"*1) 
Cn 


= ô(n) + E[P(C,,)](Ra + ô(n)) 
< (n), (3.40) 


where the last inequality follows from (3.37). Finally, note that, given a code C,,, there 
is a one-to-one mapping between the message Mo and the codeword U”. In addition, 
C,U" — X” — Z” forms a Markov chain; therefore, 


Sle sles] e sie 


1 
ZIZ”; X"|MoCy) = —I(Z"; X"[U" Cy) 
n 
1 
H(Z”|U"C,,) — —H(Z”|X"U") 
n 


1 
H(z” Ju”) n „EZ X” U”) 


= (X” : Zz" Ju”) 


I(X; Z|U), (3.41) 


where we have used the fact that (U”, X”, Z”) is i.i.d. according to puxz. On substitut- 
ing (3.39), (3.40), and (3.41) into (3.38), we obtain 


1 
| SLC] < 106 ZID = Ra + dtm, 
n 
In particular, if we choose R; and Ra such that 
Rı < I(X; YIU) — I(X; ZJU) and Ra = I(X; Z|U)-— ê(€), (3.42) 


which is compatible with the constraints in (3.36), then Rg almost cancels out the 
information rate leaked to the eavesdropper and 


| UC.) < 8(€) + 8,(n). (3.43) 


From (3.37) and (3.43) and by applying the selection lemma to the random variable C,, 
and the functions P, and L, we conclude that there exists a specific code C,,, such that 


1 
PA(C,) < 6-(n) and —-L(C,) < 6(€) + 6(n). 
n 
Consequently, there exists a sequence of (2”%9, 2”21 , n) codes {Cn }n>1 such that 


1 
lim P.(C,)=0 and lim —L(C,) < ô(€), 
n—>oo n>on 
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Re [C remainder of region 
I(X; YIU) — W(X; ZJU) fee 1(X; Y|U) 


min(I(U; Y),I(U, Z)) :4 < n 
Ro KO region Ri(pux) in (3.44) 


Figure 3.16 Region Rı(pux). 


which proves that (Ro, Ri, R; — ô(€)) is achievable. Since Rọ and R; must satisfy the 
constraints imposed by (3.36) and (3.42), and since € can be chosen arbitrarily small, 
we conclude that R; (pux) © R®°, where 


0 < Ry < min(I(U; Y), 1(U; Z)) 


A . 
Rioux) ê { (Ro, Ri, Re): 0<R<R, E e (3.44) 


By construction, the joint distribution of the random variables U, X, Y, and Z in 
Rı (pux) factorizes as pux pyz|x. 


Step 2. Outer code construction 

As illustrated in Figure 3.16, the rate region Rı(pux) in (3.44) represents only a subset 
of R®(puvx); nevertheless, the entire region can be obtained with minor modifications 
of the (2”*o, 2”*:, 2"%«) codes C, identified in Step 1. The key idea is to exploit part of 
the dummy message Mg and part of the common message Mp as individual messages. 
This can be performed by introducing sub-bins for Mo and Mg as done in Section 3.4.1, 
and we provide a sketch of the proof only. 


e By using a fraction of the dummy message rate Ra = I(X; Z|U) — d(€), we can 
increase the individual-message rate without changing the equivocation and with- 
out changing the common-message rate. Hence, the region R (pux) defined as 


0< Rk. < Ri 

0 < Re < W(X; YIU) — 1%; ZIU) 
0 < Ro < min (I(U; Y), I(U; Z)) 
0< Ri < 1X; YIU) 


Ri (pux) = 4 (Ro, Ri, Re): 


satisfies Ri (pux) C RS. 

e By sacrificing a fraction of the common-message rate Ro, we can further increase 
the individual-message rate without changing the equivocation; however, because 
of the constraints (3.36), the trade-off between the individual-message rate and the 
common-message rate is limited by 


Ro + Ry < min(I(U; Y); ICU; Y)) + I(X; YIU). 
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Hence, the region R2(pux) defined as 


O<R<R 
Rx(pux) Ê $ (Ro, Ri, Re): 0 < Re < WX; YIU) — 1X; ZIU) 
0 < Ro + Ri < K(X; YIU) + min (U; Y), I(U; Z)) 
(3.45) 


satisfies R2(pux) E RPS. 


Finally, since the distribution pux is arbitrary, we obtain 


U Ro(pux) E R. 


Pux 


Step 3. Convexity of U, x, R2(pux) 
We show that |J pux /*2(Pux) is convex by proving that, for any distributions pu, x, 
and pu,x, on U x X, the convex hull of R2(pu,x,) U R2(pu,x,) is included in 


Upux Roux). 

Let (Ro,1, Ri, Re,1) € Ro(pu,x,) be a rate triple satisfying the inequalities 
in (3.45) for some random variables Ui, X1, Yı, and Z; whose joint distribution 
satisfies 


Vu,x,y,z)EUXXxKXYxZ 
Pu:xyız (u, x, y, zZ) = pu, (u)px u: Ol) pyzix(y, z|x). 


Let (Ro,2, R12, Re,2) € R2(pu,x,) be another rate triple satisfying the inequalities 
in (3.45) for some random variables U2, X2, Y2, and Z2 whose joint distribution 
satisfies 


Vu,x,y,z)EUXXxXYxZ 
PUbX.¥Z,(U, X, y, Z) = Pu, W) px,\u,(*|u) pyzix(y, z|x). 


Our objective is to show that, for any à € [0, 1], there exist random variables U, and X, 
such that 


ARo1 + (1 —A)Ro2, 4811 + (1 —A)Ri2,ARe1 + (1 — A)Re2) E€ Ra(Pu,x,)- 
We introduce a random variable Q € {1, 2} that is independent of all others such that 


Qê 1 with probability 2, 
2 with probability 1 — À. 
By construction, Q > Ug > Xo —> YoZq forms a Markov chain, and the joint dis- 
tribution of Ug, XQ, Yo, and Zgo satisfies 


Vu, x, y, Z EUX XXY xZ 
PUgoXoYo Zo (l X, Y, Z) = Pugo (U)Pxo lUo 4) Pyz|x, 212). 
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Figure 3.17 Addition of prefix channel to BCC. 


Since Q > Ug > XQ > Yo Zgo forms a Markov chain, notice that 


I(XQ; Yala) = 
1(XQ;ZelUa 


(XQ; YalU@Q), 
(XQ; ZQlUaQ), 
(Ug; YQIQ), 
(Ug; ZeQIQ). 


We set U, £ Ug, Xi £ Xa, Ya £ YQ, and Z, 4 Zo. Then, 
Ro + Ri1) + — À)(Ro,2 + Ri,2) 
< AM(X1; Yi JU) + mind(uy; Y1), (U1; Z1))) 
+ (1 =A) (X2; Y2|U2) + min(I(U2; Y2), I(U2; Z2))) 
= AIX; YU) + Ch — AIO; Y2|U2) 
+A min((Uy; Y1), IU; Z1)) + (A — A) min(I(Uy; Y2), IU; Z2)) 
< I(XQ; YelUg) + min(I(Ug; YelQ), I(Ug; ZoIQ)) 
< 1(XQ; YalUg) + min(I(Ug; YQ), 1(Ug;ZQ)) 
= 1(%); Y,/U,) + min(i(U,; Y2), IU; Z,)). 
Similarly, we can show, 
ARo + (1 —A)Re2 < I(XQi Yall) —1(XqiZglUg) 
= 10%); YU) — 10%); ZU). 
Hence, for any à € [0, 1], there exist U, and X, such that 
(ARo1 + (1 —A)Ro2,4R11 + (1 — A)R12,4Re1 + (1 — A)Re2) € Ro(pu,x,) E R”. 


Therefore, the convex hull of Ro(pu,x,)U Ro(pux,) is in U R2(pux) and 


U pux R2(Pux) is convex. 


Pux 


Step 4. Addition of prefix channel 

Consider an arbitrary DMC (V, pxıv, ¥) that we append before the BCC 
(X, pyzx, Y, Z). This can be done using the source of local randomness. As illus- 
trated in Figure 3.17, the concatenation defines a new BCC (VY, pyz)v, Y, Z) such that 


V(v,y.z) EVXYxZ pyziv(y.zIv) = * pyzx. z) pxv xlv). 
xe 


98 


3.5.3 


Secrecy capacity 


The random-coding argument and outer code construction that led to the charac- 
terization of 2(pux) can be reapplied to this new channel. Therefore, we conclude 
that 


0 < Re < Ry 
U 4 (Bo, Ri, Re): O< Re < WV; YIU) — ICV; ZIU) oR. 
Puvx 0 < Ro +R, < I(V; YIU) + min(I(U; Y), I(U; Z)) 


By construction, U > V —> X —> YZ forms a Markov chain. It remains to prove that we 
can restrict the cardinality of U and V to |U| < |¥| + 3 and|V| < |X|? + |X| + 1. This 
follows from Caratheodory’s theorem, and we refer interested readers to [18, Appendix] 
for details. 

At this point, the introduction of a prefix channel is somewhat artificial, but the con- 
verse part of the proof given in the next section shows that this additional randomization 
of the stochastic encoder is required in order to match the rate—equivocation region 
obtained in the converse proof. From a practical perspective, the prefix channel sug- 
gests that additional randomization may be required in the encoder; this might not be 
surprising, because the equivocation calculation in Step 2 relies on a specific stochastic 
encoding scheme that might not be optimal. Fortunately, Corollary 3.5 shows that this 
additional randomization is not necessary if the eavesdropper’s channel is less capable 
than the main channel. 


Converse proof for the broadcast channel with confidential messages 


Consider an achievable rate triple (Ro, Ri, Re) and let € > 0. For n sufficiently large, 
there exists a (2”*», 2"%1, n) code C, such that 


1 1 
—H(MolCn) > Ro, —H(M1|C,) > Ri, 
n n 


LEC, > Re- 8O BC) < 510). 


In the remainder of this section, we drop the conditioning on C, to simplify the notation. 
Using Fano’s inequality, we obtain 


1 1 
—H(MoM,|Y”) < 6(€) and —H(Mo|Z") < é(e). 
n n 


Therefore, 
1 
Re < —E(C,) + ô(€) 
n 
1 n 
= Mil Z )+ 6(€) 
1 


1 
M1 |2"Mo) + — Mo; Mi|2") + â(€) 


= 3 


IN 


1 1 
z FMi IMo) — ZIM; 2"|Mo) + — H(Mo|Z") + êe) 
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1 1 
7 FMi Mo) — = Ma; 2"|Mo) + 8(€) 


IN 


1 1 1 
= STM; ¥"|Mo) — IM; 2"|Mo) + HM |Y" Mo) + 8(€) 


1 1 
< 71M1; V"|Mo) — 10M; Z"|Mo) + 5(€). (3.46) 


Single-letterizing (3.46) is more arduous than in Section 3.4.2 because the channel is not 
degraded. The solution to circumvent this difficulty is a standard technique from multi- 
user information theory, which consists of symmetrizing the expression by introducing 
the vectors 


YIA2(\%,..., ¥i-1) and = Z41 4.(7..1,...,Z,) forie [l,a], 
with the convention that Y° =0 and Ž”+! =0. We introduce Y'=! and Z'+! in 
1(M,; Y"|Mo) as follows: 

I(Mı; Y"|Mo) = $ (Mi; Y;|MoY"') 


i=l 


= 2, (1(M,Z'*!; ¥;|MoY‘~') — 1(Z'+!; Y;|MoM1Y'~!)) 
i=1 


= X (I(Mi; ¥iIMo¥'1Z'*") + 1(2Z'*1; ¥;|MoY') 
i=1 


—1(Z'*1; Y;|MoM1Y'')). (3.47) 


Similarly, we introduce Y'~! and Z'+! in I(M,; Z”|Mo) as follows: 


I(M;; Z"|Mo) = $_ (I(M;; Z;|MoZ'*")) 


= ( (Mi Y"~ ;Zi|MoZ'*') —1(¥''; Z;|MoM,Z'*")) 


E ( (Mi; Z; MoY'™' Ži+!) + 1(Y¥'7!; Z;|MoZ'*') 


—1(¥''; Z;|MoM,Z'*")). (3.48) 
The key observation to simplify these expressions is the following lemma 


Lemma 3.5. 

XC 1(Z'*1; ¥i|MoY'“) = XC I(Z;; 7" |My Z"*"), 
i=l j=l 

XO 1(Z'*1; ¥iIMoM1Y'“) = XC I(Z;; YIMM: Z!*"). 


i=l j=l 
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Proof. This result follows from the chain rule of mutual information using an appropriate 
change of indices: 


do1(2 YiIMoY'") = 5 5 I(Z;; Y;|MoY' 1 2Z/*") 


i=l i=1 j=i+1 


n j=l 
=> S51(Z;3¥;|Mo¥''Z/*") 
j=l i=l 
= S°1(Z;;¥7!|MoZ!*"). (3.49) 
j=1 
Similarly, one can show that 


XO 1(Z'*!; Y;|MoM Y!) = $0 I(Z;; YIMM: Z/*1). (3.50) 


i=1 j=l 


Hence, on substituting (3.47) and (3.48) into (3.46), we obtain, with the help of 
Lemma 3.5, 


n 


Re < >> (I(Mi; YiIMo¥'Z'*") — (My; Z;[MoY''Z'*')) +80. 3.51) 
i=], 


The common message rate Ro can be bounded in a similar manner as 


1 1 1 
Ro < ~H(Mo) Mo; Y”) + Mol") 


IN 


“My; Y") + 8(6 


1 n , 
= = > 1(Mos ¥il¥"") + 6(€) 


i=l 
= 1 Dmv) — Z P1H: v vmo) + 8(€). 
(3.52) 
On substituting the simple bounds 
I(Z'*!;Y;[¥-'Mo) >0 and I(MoZ'*1; Y;|¥'~') < I(MoZ't"Y'71; Y;) 
into (3.52), we obtain 


zl n X , 
as <-S O ( (MoZ'*!Y'-!; Y;) + 8(€). (3.53) 
i=1 


3 
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By repeating the steps above with the observation Z” instead of Y” we obtain a second 
bound for Ro: 


1 
Ro < —H(Mo) 
n 


1 “ Si4+]yi-1. 7. o1 “ i-1, 7.) 5i+1 
So Mi Y=; Z) D ;Z;|Z't1Mo) +8) (8.54) 


i=l 
1 n pi : 
= I(MoZ!t!Y¥'-!; Z; 5(€). 3.55 
5M ) + 5(6) (3.55) 
Finally, we bound the sum-rate Ro + R, as follows: 


1 1 
Ri + Ro <S „ HMoM:) = „ HM |Mo) + H(Mo) 


1 1 
“(Mi Y"IMo) + -E(M: IY" Mo) + H(Mo) 


IN 


1 1 
re Y"|Mo) + 7 HMo) + ô(€). 


From (3.52), we know that 


1 1 . Fit+l.y jyi—l o1 Á Fi+l.y jyi—l 
pS Solar ne) 7a sY;[¥'~'Mo) + 5(€), 


and from (3.47) we have 
1 eee pies aes ; 
—I(My; ¥"|Mo) < — 4 1(My; Y: |MoYi T! Ži +!) + -NO I(Ži+!; Y; MY" !). 
JK 1; Y"|Mo) aa 1; Yil|Mo ere |MoY"~") 
On combining the two inequalities, we obtain 


1 ` Zi+1. v jyi—l 1 “ V. i-17Fi+1 
a a ; Y; |Y Jae a A May Ži+!) + 8(€). 


(3.56) 


Similarly, using (3.54) in place of (3.52) and using Lemma 3.5, we obtain a second 
bound 


1 n ; 7 1 n : 7 
Ri + Ro < — X WM Ee Zi žit!) + — SMa VIM VIZ) + 8). 
1 + Ro pee ; Zl J MMs ) +6) 
(3.57) 
Let us now introduce the random variables 


u;2Y'!Z4'My) and V; £U;M). 
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YıZı Y2Z2 Y3Z3 


Figure 3.18 Functional dependence graph of random variables involved in the coding scheme for 
n=3. 


Using the functional dependence graph illustrated in Figure 3.18, one can verify that the 
joint distribution of U;, V;, X;, Y;, and Z; is such that 
V(u, v, x, y, Z) EUXVXÆXXYXxZ 
Pu;v:X:y;z; (U, V, X, Y, Z) = pu; (u)Pv u, (lM) pxu; lw) pyzjx(vz1*), 


where pyzjx are the transition probabilities of the BCC (¥, pyzx, Y, Z). We now 
introduce a random variable Q that is uniformly distributed over |1, n] and independent 
of MoM,X"Z" Y”, and we define 


USU9Q, VUM, X4XQq, YêYo, and ZêZo. (8.58) 


Note that Q —> U — X — YZ forms a Markov chain. On substituting U, V, X, Y, and 
Z defined by (3.58) into (3.53) and (3.55), we obtain 


‘ E Ji i— E Fi i— 
ty < min( 1551002 Hy ee sv) + ô(€) 


Sig en 
= min 22, Suw) + d(€) 


= min(I(U; ZIQ), IU; Y|Q)) + d(e) 
< min(I(U; Z), I(U; Y)) + 4(6), (3.59) 


where the last inequality follows because Q —> U — YZ forms a Markov chain. Simi- 
larly, on substituting (3.58) into (3.51), 

lL eae, ae 

Re < = > (I(Mi; Y;|Mo Y"! Ži+!) — 1(My; Z;|Mo¥'!Z'*1)) + 8(€) 
i=l 
1 n 
= - X dV; YAU) — ICV; ZU) + (©) 
n 


i=l 


= I(V; YIU) — I(V; ZIU) + 6(e). (3.60) 


3.6 


3.6.1 
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Finally, on substituting (3.58) into (3.56) and (3.57), 
Ro + Ry < (V; YIU) + min(I(U; Z), ICU; Y) + (6). 


Since € can be chosen arbitrarily small, we finally obtain the converse result 


O<R<R 
R C [LJ 4 (Ro, Ri, Re): OS Re < Ri < MV; YIU) — KV; ZIU) 
Puvx 0 < Ro + Ri < IV; YIU) + min (I(U; Y), ICU; Z)) 


Multiplexing and feedback 


Multiplexing secure and non-secure messages 


The expression of the secrecy-capacity region C®°° in Corollary 3.2 tells us that, for a 
fixed distribution puyx onU x V x # that factorizes as pu pvju Pxiv, we can transmit 
a common message to Eve and Bob at a rate arbitrarily close to min(I(U; Z), I(U; Y)) 
while simultaneously transmitting an individual secret message to Bob at a rate arbitrarily 
close to I(V; YIU) — I(V; Z|U). However, this result provides only a partial view of what 
can be transmitted over the channel, because we know from the proof of Theorem 3.3 
that Alice can use a dummy message Ma as her source of local randomness; this message 
is decodable by Bob and is transmitted at a rate arbitrarily close to 1(V; Z|U). Although 
message M, is not decodable by Eve, it is not secure either. Hence, we call Mg a public 
message to distinguish it from the common message and the individual secret message. 
Therefore, fora BCC (X, pyz)x, VY, Z), there exists a code that conveys three messages 
reliably: 


e a common message for Bob and Eve at a rate close to min(I(U; Z), I(U; Y)); 
e a secret message for Bob at a rate close to I(V; YIU) — I(V; Z|U); 
e apublic message for Bob at a rate close to I(V; Z|U). 


The total transmission rate to Bob, Riot, is then arbitrarily close to 
min(I(U; Z), I(U; Y) + I(V; Y|U). 


In particular, for the specific choice U = 0, we obtain a total rate to Bob on the order 
of I(V; Y), of which a fraction I(V; Y) — I(V; Z) corresponds to a secure message. As 
illustrated in Figure 3.19, this allows us to interpret a wiretap code for the WTC as a means 
to create two parallel “pipes,” a first pipe transmitting secure messages hidden from the 
eavesdropper and a second pipe transmitting public messages. From this perspective, note 
that transmitting secret messages incurs little rate penalty and comes almost “for free.” 
For some specific WTCs, it is possible to show that secrecy comes exactly “for free.” 


Proposition 3.8. Consider a WTC (4X, pyzx, Y, Z) in which the eavesdroppers s chan- 
nel (X, PZIx; Z) with capacity C, is noisier than the main channel ee PY|x; y) with 
capacity Cm. Assume that both channels are weakly symmetric. Then, there exists a code 
that transmits simultaneously 
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Figure 3.19 Communication over a broadcast channel with confidential messages viewed as parallel 
bit-pipes. 


e a secret message at a rate Rs arbitrarily close to CY" = Cm — Ce and 
e a public message at a rate R, arbitrarily close to Ce. 


In other words, it is possible to transmit a secret message at a rate arbitrarily close to 
the secrecy capacity C" and still achieve a total reliable transmission rate arbitrarily 
close to the capacity of the main channel Cm. 


Proof. Let € > 0. For the specific choice U = 0 and V = X in Section 3.5.2, we know 
that there exists a (2”®, n) wiretap code C, with R, = I(X; Y) — I(X; Z) — ê(€) that 
reliably transmits a dummy message at rate Ra = I(X; Z) — (€). The total transmission 
rate is then 


Riot = Rs + Ra = U(X; Y) — ê(€). 


In general, this does not imply that we can exhaust the capacity of the main channel and 
transmit at the secrecy capacity because the distribution maximizing I(X; Y) might not 
maximize I(X; Y) — I(X; Z) simultaneously. However, if all channels are weakly sym- 
metric and the eavesdropper’s channel is noisier than the main channel, the maximizing 
distribution is the same, and it is possible to transmit simultaneously a secure message 
at rate 


max(I(X; Y) — I(X; Z)) — 6(€) = Cn — Ce — ô(€) 
Px 


and a public message at rate 


max I(X; Z) — 6(€) = Ce — ô(€), 


such that the total rate is arbitrarily close to the capacity of the main channel. 


Feedback and secrecy 


Although feedback does not increase the channel capacity ofa DMC, the situation is quite 
different for the secrecy capacity. In many situations, it is possible to show that feedback 
increases the secrecy capacity, however, a more precise statement hinges on additional 
assumptions regarding the nature of the feedback link. The most general approach to 
analyze feedback would be to consider a two-way communication channel, in which both 
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Figure 3.20 WTC with confidential rate-limited feedback. 


the forward channel and the reverse channel are broadcast channels overheard by the 
eavesdropper. A specific instance of a two-way wiretap channel is analyzed in Chapter 8 
in the context of multi-user wiretap channels, but a general solution remains elusive. 
Even if we simplify the model by considering extreme situations in which the feedback 
link is either confidential (unheard by the eavesdropper) or public (perfectly heard by 
the eavesdropper), the fundamental limit is unknown for arbitrary discrete memoryless 
channels. 

In this section, we determine achievable full secrecy rates for a WTC with confidential 
but rate-limited feedback, which is illustrated in Figure 3.20. This model is a variation 
of the WTC, in which Bob has the opportunity to transmit confidential messages to 
Alice at a rate less than Rp. Although the model may seem overly optimistic, it provides 
valuable insight into how confidential feedback should be used for secrecy. The case of 
public feedback is studied in Chapter 4 in the context of secret-key agreement. 


Definition 3.14. 4 Qe n) code Cn for a WTC with confidential rate-limited feedback 
Rr consists of 


a message set M = Į, R] : 

a source of local randomness (Rx, PRx) at the encoder; 

a source of local randomness (Ry, Pry) at the receiver; 

a feedback alphabet F = |1, F] such that log F < Rg; 

a sequence ofn encoding functions e; : M x F'"! x Ry > X fori € |1, n], which 
map a message m, the causally known feedback symbols f'~', and a realization r, of 


the local randomness to a symbol x; € X; 
e a sequence of n feedback functions g; : Yi! x Ry —> F fori € [1,n], which map 
past observations y'—! 
symbol fi € K; 


e a decoding function g : Y” x Ry > M U {?}, which maps each channel observation 


and a realization r, of the local randomness to a feedback 


y” and realization of the local randomness r, to a message m € M or an error 
message ?. 


We restrict ourselves to characterization of the secrecy capacity. Note that it is not a 
priori obvious what the optimal way of exploiting the feedback is. Nevertheless, we can 
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obtain a lower bound for the secrecy capacity by studying a simple yet effective strategy 
based on the exchange of a secret key over the confidential feedback channel. In terms 
of the multiplexing of secure and non-secure messages discussed in Section 3.6.1, the 
idea is to use a wiretap code to create a secret and a public bit-pipe, and to protect part 
of the public messages by performing a one-time pad with the secret key obtained over 
the feedback channel. 


Proposition 3.9 (Yamamoto). The secrecy capacity C® of a WTC (X, pyzx, Y, Z) 
with confidential rate-limited feedback Rẹ satisfies 


Ci? > max min(I(V; Y), I(V; Y) — I(V; Z) + Rr). 
PVX 


If the eavesdropper 5 channel is noisier than the main channel and both channels are 
weakly symmetric, the bound can be replaced by 


C > min(Cm, Cm a Ce F Rt), 
where Cm is the main channel capacity and Ce the eavesdropper ïs channel capacity. 


Proof. Let € > 0, B € N*, and n € N*. We consider a block-Markov coding scheme 
over B blocks of length n, such that the selection of the transmitted codeword in each 
block b € [2, B] is a function of the current message m, and of an independent secret 
key kp—ı exchanged during the previous block over the secure feedback channel. The 
secret key allows us to encrypt otherwise insecure parts of the codewords with a one-time 
pad, and thus leads to higher secure communication rates. 

Formally, the scheme operates as follows. We assume there exists a distribution 
px over æ such that I(X; Y) — I(X; Z) > 0. We then consider a (2”*, 2”*«, n) code C, 
identified by the random coding argument in Section 3.4.1 with 


R = I(X; Y) — 1X; Z) — ô(€), Ra = I(X; Z) — ô(€) 
and such that 
P.(Cn) < d-(m) and A < ê(€) + b(n). 


We set Ro = min(Rr, Ra) and, for each m € |1, 2”?], we distribute the codewords 
x"(m, ma) with ma € |1, 2”®:] in [2%] bins B„ (i) with i € [1, 2”®] and we relabel 
the codewords 


VEL: with (i, j,k) e |1, 2°] x (1, 272] x [1, 28]. 


This binning procedure is the same as the one illustrated in Figure 3.11. The sub-binning 
is revealed to all parties and we consider the following encoding/decoding procedure 
over B blocks of length n. 


e Encoder for block 1. Given mı and m), transmit x”(m1, m4, k) € Ch, where k is 
chosen uniformly at random in Bm, (m). During the transmission, receive secret key 
kı uniformly distributed in ||1, 2”*°] over the feedback channel. 

e Encoder for block b € |2, B|. Given m, and mi, transmit x” (mp, m, ® kp-1, k) € Cy, 
where ® denotes modulo-[2”®] addition and k is chosen uniformly at random in 
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Bm,(m,). During the transmission, receive secret key kp uniformly distributed in 
[1, 2”*°] over the feedback channel. 

e Decoder for block b = 1. Given y”, use the decoding procedure for C, described in 
Section 3.4.1. 

e Decoder for block b € |2, B]. Given y”, use the decoding procedure for C, to retrieve 
mp and m, ® kp—1. Use kp_, to retrieve m}. 


The sub-binning together with the encoding and decoding procedures above defines a 
(2"8® nB) code Cygz of length nB for the WTC with secure feedback. We assume that 
messages M, and Mi, for b € [1, B] are uniformly distributed so that the rate of C,.g is 


R' = R + Ry — 8(n) = min(I(X; Y), (X; Y) — I(X; Z) + Rẹ) — 8(n). 


We ignore the fact that the distribution of the codewords of C, may be slightly non- 
uniform because of the binning and we refer the reader to Section 3.4.1 for details on 
how to deal with this subtlety. 

Because the secret keys kp are perfectly known both to the encoder and to the decoder, 
the probability of error for G B is at most B times that of C, and 


P.(ĈnB) < BP;(C,) < Bôe(n). 


For b € [1, b], we let Z} denote the eavesdropper’s observation in block b. The infor- 
mation leaked Lic; B) is then 


1 x 1 x 
$ 5 
= nB I(M,M;; Z5lĈng) 
b=1 
ee a BE APEE E 
= B 5 -I(Ms; Z5lĆnB) + Ti(M: Z3IMsCun) ) ’ 


b=1 


where the first equality follows because M, and Mj, depend only on Z}. For b > 2, 
message Mi, is protected with a one-time pad and the crypto lemma guarantees that 
(1/n)I(Mj; ZiM g) = 0. For b = 1, we can use the upper bound 


1 7 
—I(MjZi|MiCng) < Ro + 5(n). 
n 
In addition, the construction of Ĉ, g guarantees that 
1 k 1 
—I(My; Z5|Cng) = —LCn) < 6(€) + 5e(n) forb € [1, B]. 
n n 


Therefore, 


gee 1 ʻ 
Ls) < a (x +40) +3 in) +10) 


b=2 


= (€) + 6(B) + 6,(n). 
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Hence, the rate—equivocation pair (R’, R’ — 6(€) — 6(B)) is achievable. Since € can be 
chosen arbitrarily small and B can be chosen arbitrarily large, we conclude that 


Ci? > max min(I(X; Y), I(X; Y) — I(X; Z) + Re). 
PX 


One can also check that min(I(X; Y), Re) is an achievable rate if I(X; Y) — I(X; Z) = 0 
for all X. As in Section 3.5.2, we can introduce a prefix channel (v, Px\v, & ) before 
the WTC (4, pyz|x, Y, Z) to obtain the desired lower bound. 

If all channels are weakly symmetric and the eavesdropper’s channel is noisier than 
the main channel, we can remove the prefix channel as in the proof of Corollary 3.5, and 
the symmetry guarantees that both I(X; Y) and I(X; Y) — I(X; Z) are maximized with X 
uniformly distributed. 


If the eavesdropper’s channel is physically degraded with respect to the main channel, 
the pragmatic feedback strategy used in Proposition 3.9 turns out to be optimal. The proof 
of optimality is established in Section 4.4 on the basis of results about the secret-key 
capacity. 


Theorem 3.4 (Ardestanizadeh et al.). The secrecy capacity of a DWTC 
(X, pyzx, Y, Z) with confidential rate-limited feedback R¢ is 


CoM’ = max min(I(X; Y), I(X; Y|Z) + Re). 
Px 


Note that, even if Z = Y and the secrecy capacity without feedback is zero, Proposi- 
tion 3.9 guarantees a non-zero secure communication rate. This is not surprising because 
the feedback link is secure and can always be used to transmit a secure key and perform 
a one-time pad encryption; however, we will see in Chapter 4 and Chapter 8 that even 
public feedback enables more secure communications. 


Conclusions and lessons learned 


We conclude this chapter by summarizing the lessons learned from the analysis of 
secure communication over noisy DMCs. Most importantly, we proved the existence and 
identified the structure of codes that guarantee reliability and security simultaneously 
over wiretap channels. The specificity of these codes is that they map a given message 
to a bin of codewords, of which one is selected randomly by the encoder. Intuitively, the 
role of the randomness in the encoder is to “confuse” the eavesdropper by compensating 
for the information leaked about the codeword during transmission. 

Unfortunately, as exemplified by Corollary 3.5, secure communications at a non-zero 
rate seem to be possible only when the legitimate receiver has a “physical advantage” 
over the eavesdropper, which we formalized with the notion of “noisier” channels. 
Although there are practical applications for which this condition is met, in particular 
if the Gaussian wiretap channel studied in Chapter 5 is a realistic model (near-field 
communications, RFID transmissions; see Chapter 7), it is fair to acknowledge that 
requiring an explicit advantage over the eavesdropper is a weakness in the model. 
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From the results of this chapter alone, one would even be inclined to conclude that the 
information-theoretic approach considered in this chapter does not improve on stan- 
dard cryptography, and might merely be an alternative solution for which different 
trade-offs are made. Cryptography imposes restrictions on the computational power of 
the eavesdropper to relax assumptions on the communication channel (it considers the 
worst situation with a noiseless channel to the eavesdropper), whereas information- 
theoretic security seems to require an advantage at the physical layer in order to avoid 
restrictive assumptions on the abilities of the eavesdropper. Nevertheless, the some- 
what unsatisfactory results obtained thus far hinge on the simplicity of the communi- 
cation schemes, which do not account for powerful codes based on interactive com- 
munications. We will see in subsequent chapters that information-theoretically secure 
communications are sometimes possible without an explicit advantage at the physical 
layer. 

We close this section with a review of the explicit and implicit assumptions used in 
the previous sections. Understanding these assumptions and their relevance is crucial to 
assess the potential of physical-layer security from a cryptographic perspective. 


e Knowledge of channel statistics. In general, the lower bound of the eavesdropper’s 
equivocation guaranteed by Theorem 3.2 is valid only if the code is tailored to the 
channel, which requires knowledge of the channel statistics both for the main channel 
and for the eavesdropper’s channel. The assumption that the main channel is perfectly 
known is usually reasonable, since Alice and Bob can always cooperate to characterize 
their channel; however, the assumption that the eavesdropper’s channel statistics are 
also known is more questionable. Nevertheless, as discussed in Section 3.5.1, this 
stringent requirement can be somewhat alleviated for a class of stochastically degraded 
channels. 

e Authentication. The wiretap channel model assumes implicitly that the main chan- 
nel is authenticated and, therefore, wiretap codes do not provide any protection 
against man-in-the-middle attacks; however, this assumption is not too restrictive, 
since authentication mechanisms can be implemented in the upper layers of the pro- 
tocol stack. We shall see in Chapter 7 that it is possible to ensure unconditionally 
secure authentication with a negligible cost in terms of the secure communication 
rate, provided that a short secret key is available initially. 

e Passive attacker. The scope of the results developed in this chapter is strictly restricted 
to eavesdropping strategies. Additional techniques are required for situations in which 
the adversary tampers with the channels, for instance by jamming the channel. 

e Perfect random-number generation. The proofs developed in this chapter rely on the 
availability of a perfect random-number generator at the transmitter. The reader can 
check that the equivocation decreases if the entropy of the random-number generator 
is not maximum. 

e Weak secrecy. As discussed in Section 3.3, weak secrecy is likely not an appropriate 
cryptographic metric. Therefore, the relevance of the results derived in this chapter 
will be fully justified in Chapter 4 when we show that the results do not change if the 
weak secrecy criterion is replaced by the strong secrecy criterion. 
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the need for a random key as long as the message to be encrypted had already been 
identified. Shannon’s cipher system formalized the problem of secure communications 
over noiseless channels in an information-theoretic framework [1]. Shannon’s work 
includes an analysis of the one-time pad, but the crypto lemma in its most general form 
is due to Forney [20]. The degraded wiretap channel model and the notion of secrecy 
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shortly after [22], and a simplified proof of Wyner’s result was proposed by Massey [23]. 
The extension of Wyner’s results to broadcast channels with confidential messages is 
due to Csiszár and K6rner [18]. The proofs presented in this chapter are based on 
typical-set decoding in the spirit of [21], and can be easily combined with standard 
random-coding techniques for multi-user channels [3, 6]; however, this approach yields 
only weak secrecy results, and additional steps are required in order to strengthen the 
secrecy criterion. These steps are based on results from secret-key agreement that are 
presented in Section 4.5. Alternative proofs based on more powerful mathematical tools, 
such as graph-coloring techniques [24] or information-spectrum methods [25, 26], can 
be used to derive the secrecy capacity with strong secrecy directly. 

Although stochastic encoders and codes with a binning structure are required for 
secure communications over memoryless channels, this need not be the case for other 
models. For instance, Dunn, Bloch, and Laneman investigated the possibility of secure 
communications over parallel timing channels [27] and showed that deterministic codes 
can achieve non-zero secure rates. 

The various notions of partial ordering of channels were introduced by Körner and 
Marton [28], and the characterization of the relation “being noisier” in terms of the 
concavity of I(X; Y) — I(X; Z) was obtained by Van Dijk [29]. The examples illustrating 
the notions of “noisier” and “less capable” channels given in Section 3.5.1 are due 
to Nair [30]. The combination of wiretap coding and one-time pad was proposed by 
Yamamoto [31], and the wiretap channel with a secure feedback link was investigated 
by Leung- Yan-Cheong [32], Ahlswede and Cai [33], and Ardestanizadeh, Franceschetti, 
Javidi, and Kim [34]. 

All of the results presented in this chapter assume a uniformly distributed source of 
messages; nevertheless, Theorem 3.2 and Theorem 3.3 can be generalized to account for 
arbitrarily distributed sources [18, 21], in which case a separation result holds: no rate 
is lost by first compressing the source and then transmitting it over a broadcast channel 
with confidential messages. 

Although we mentioned that equivocation and probability of error are fundamentally 
different quantities, the relation between them was investigated more precisely by Feder 
and Merhav [35]. In particular, they provide lower bounds for the conditional entropy 
as a function of the probability of block error, which confirm that evaluating security 
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on the basis of equivocation is a stronger requirement than simply requiring a decoding 
error for the eavesdropper. 

An important issue that we have not addressed in this chapter is the numerical compu- 
tation of the secrecy capacity. The presence of auxiliary random variables in Corollary 3.4 
makes this computation difficult, and no efficient generic algorithm is known. Never- 
theless, headway has been made in certain cases. For cases in which the eavesdropper’s 
channel is noisier than the main channel, Yasui, Suko, and Matsushima proposed a 
Blahut—Arimoto-like algorithm to compute the secrecy capacity [36], which was also 
shown to be useful when the eavesdropper’s channel is less capable than the main channel 
by Gowtham and Thangaraj [37]. 


Secret-key capacity 


In Chapter 3, we considered the transmission of information over a noisy broadcast 
channel subject to reliability and security constraints; we showed that appropriate cod- 
ing schemes can exploit the presence of noise to confuse the eavesdropper and guarantee 
some amount of information-theoretic security. It is important to note that the wiretap 
channel model assumes that all communications occur over the channel, hence com- 
munications are inherently rate-limited and one-way. Consequently, the results obtained 
do not fully capture the role of noise for secrecy; in particular, for situations in which 
the secrecy capacity is zero, it is not entirely clear whether this stems from the lack 
of any “physical advantage” over the eavesdropper or the restrictions imposed on the 
communication schemes. 

The objective of this chapter is to study more precisely the fundamental role of noise in 
information-theoretic security. Instead of studying how we can communicate messages 
securely over a noisy channel, we now analyze how much secrecy we can extract from 
the noise itself in the form of a secret key. Specifically, we assume that the legitimate 
parties and the eavesdropper observe the realizations of correlated random variables and 
that the legitimate parties attempt to agree on a secret key unknown to the eavesdropper. 
To isolate the role played by noise, we remove restrictions on communication schemes 
and we assume that the legitimate parties can distill their key by communicating over a 
two-way, public, noiseless, and authenticated channel at no cost. Contrary to the case in 
Chapter 3, in which the natural metric of interest was the number of message bits that 
one could transmit securely and reliably per channel use, here the relevant metric is the 
number of secret-key bits distilled per observation of the correlated random variables. 

We start this chapter by introducing two standard models for secret-key agreement, 
called the source model and the channel model, and by defining key-distillation strate- 
gies (Section 4.1). We then discuss the fundamental limits of key generation for a source 
model (Section 4.2) and we study in detail a specific type of key-distillation strategy, 
which we call “sequential key-distillation strategies” (Section 4.3). For these strategies, 
we prove results under a strong secrecy condition and we show how to construct practical 
strategies. Finally, we study the fundamental limit of secret-key generation over a chan- 
nel model (Section 4.4) and show that the fundamental limit remains unchanged under 
a strong secrecy condition (Section 4.5). In other words, we prove that strong secrecy 
comes “for free,” that is, a secure rate achievable with weak secrecy is also achiev- 
able with strong secrecy. This result is crucial to justify a posteriori the cryptographic 
relevance of the results derived in Chapter 3 for a weak-secrecy condition. 


4.1 
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Figure 4.1 Source model for secret-key agreement. 


Source and channel models for secret-key agreement 


As illustrated in Figure 4.1, a source model for secret-key agreement represents a 
situation in which three parties, Alice, Bob, and Eve, observe the realizations of a DMS 
(XYZ, pxyz) with three components. The DMS is assumed to be outside the control 
of all parties, but its statistics are known. By convention, component X is observed by 
Alice, component Y by Bob, and component Z by Eve. Alice and Bob’s objective is 
to process their observations and agree on a key K about which Eve should have no 
information. To capture the essence of the problem, few restrictions are placed on the 
communication between Alice and Bob: they can exchange messages over a noiseless, 
two-way, and authenticated channel; however, to avoid trivializing the problem, the two- 
way channel is public, that is all messages are overheard by Eve and the existence of 
the public channel does not provide Alice and Bob with an explicit advantage over Eve. 
The only real simplifying assumption is the existence of an authentication mechanism 
that prevents Eve from tampering with communications over the public channel. Finally, 
we allow Alice and Bob to randomize the messages they transmit, which we model 
with sources of local randomness as done in Chapter 3. Alice has access to a DMS 
(Rx, PRx) and Bob has access to a DMS (Ry, pry), which are mutually independent 
and independent of the DMS (XYZ, pxyz). The rules by which Alice and Bob compute 
the messages they exchange over the public channel and agree on a key define a key- 
distillation strategy. 


Definition 4.1. 4 (2"*, n) key-distillation strategy S,, for a source model with DMS 
(XYZ, pxyz) consists of 


e a key alphabet K = 1, gel 

an alphabet A used by Alice to communicate over the public channel; 
an alphabet B used by Bob to communicate over the public channel; 
a source of local randomness for Alice (Rx, PRx iy 


a source of local randomness for Bob (Ry, Pry); 


1 We use the term “strategy” instead of “code” because a key does not carry information by itself. 
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Figure 4.2 Channel model for secret-key agreement. 


e an integerr € N* that represents the number of rounds of communication; 
r encoding functions f; : X” x B! x Ry > A fori e |l,r]; 

r encoding functions g; : Y” x A! x Ry > B fori € [l,r]; 

a key-distillation function ky: X" x B" x Rx > K; 

a key-distillation function Kp : Y” x A" x Ry > K; 


and operates as follows: 


e Alice observes n realizations of the source x" while Bob observes y"; 
e Alice generates a realization ry of her source of local randomness while Bob generates 


ry from his; 
e in round i € [1,r], Alice transmits a; = fi (x”, bit, rx) while Bob transmits b; = 
gi (", ai}, ry); 


e after round r, Alice computes a key k = k(x", b”, ry) while Bob computes a key 


k= Kb ae, a’, fy): 

By convention, we set A? = 0 and B? £ 0. Note that the number of rounds r and the 
DMSs (Rx, pry) and (Ry, pry) can be optimized as part of the strategy design. The 
(2”* , n) key-distillation strategy S,, is assumed known ahead of time to Alice, Bob, and 
Eve. 

The source model assumes the existence of an uncontrollable external source of 
randomness and abstracts the physical origin of the randomness completely. Although 
this is a strong assumption, there are several practical situations in which this would be 
a reasonable model. For instance, in wireless sensor networks, devices may monitor a 
physical phenomenon (change in temperature or pressure) whose statistics may be known 
but whose complexity is such that we can reasonably assume that it cannot be controlled. 
Nevertheless, it is legitimate to wonder what happens if the source of randomness is 
partially controlled by one of the parties. The analysis of situations in which the source 
is partially controlled by the eavesdropper is not fully understood and is not covered in 
this book. We refer the interested reader to the bibliographical notes at the end of the 
chapter for references to existing results. 

It is somewhat less arduous to study the situation in which the source is partially 
controlled by one of the legitimate parties. In this case, the model is called a channel 
model for secret-key agreement and is as illustrated in Figure 4.2. Instead of observing 


4.1 Source and channel models 115 


the realizations of an external source, Bob and Eve now observe the outputs of a 
DMC (X, pyzx, Y, Z) whose input is controlled by Alice. Alice and Bob have again 
access to a public, noiseless, two-way, and authenticated channel over which they can 
exchange messages to agree on a secret key. We also assume that Alice and Bob have 
access to sources of local randomness (Rx, PRx) and (Ry, Pry) to randomize their 
communications. Key-distillation strategies for the channel model are more sophisticated 
than those for the source model, because Alice can use the feedback provided by Bob to 
adapt the symbols she sends in the channel. Despite the similarity between the channel 
model for secret-key agreement and the wiretap channel studied in Chapter 3, note that 
the problems are different: in a channel model for secret-key agreement, the broadcast 
channel is used not only to communicate messages but also to generate randomness. 


Definition 4.2. 4 (2"*, n) key-distillation strategy S, for a channel model with DMC 
(¥, pyzx, VY, Z) consists of 


a key alphabet K = Į, a . 

an alphabet A used by Alice to communicate over the public channel; 

an alphabet B used by Bob to communicate over the public channel; 

a source of local randomness for Alice (Rx, PRx % 

a source of local randomness for Bob (Ry, Pr) 

an integer r € N* that represents the number of rounds of communication; 


a set of n distinct integers {ij} C |1, r] that represents the rounds in which Alice 

transmits a symbol over the channel; 

e r — n encoding functions f; : Bi! x Ry > A fori € [1, r] Win; 

e r — n encoding functions g; fori € [1, r] \ {ij}n ofthe form g; : Yİ x Ai! x Ry > 
B ifi = li; + liji = 1]; 

e n functions h; : BU! x Ry —> X for j € |L, n] to generate channel inputs; 

e a key-distillation function ka : X" x B" x Rx > K; 

e a key-distillation function kp : Y” x A" x Ry > K; 


and operates as follows: 


e Alice generates a realization ry of her source of local randomness while Bob generates 
ry from his; 

e inroundi € [1, i; — 1], Alice transmits message a; = fi (oe) and Bob transmits 
message b; = g; (ae on 

e inroundi; with j € [1,n], Alice transmits symbol x; = h; (oo ry) over the chan- 
nel, and Bob and Eve observe the symbols y; and Z;, respectively; 

e in round i € i; +1, ij} — 1], Alice transmits message a; = f; (x/, Bre) and 

i-1 , r); 


e after the last round, Alice computes a key k = k,(x", b" , rx) and Bob computes a key 


Bob transmits message b; = g; (y!,a 


k = Kp (y", á! ry): 

By convention, we setin41 £ r + 1,i9 = 0, A? = 0, and B? £ 0. Note that the number 
of rounds r, the indices {i;}, of the rounds during which a symbol is transmitted over 
the channel, and the sources of local randomness (Rx, pry) and (Ry, pry) can be 
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optimized as part of the strategy design. Again, the key-distillation strategy S,, is assumed 
known to Alice, Bob, and Eve ahead of time. 


Remark 4.1. A wiretap code C, is a key-distillation strategy for a channel model since 
Alice can use C,, to transmit uniform secret keys to Bob directly without using the public 
channel. In general, key-distillation strategies that exploit the public channel are more 
powerful, but we will see an example of a channel model for which using a wiretap code 
is an optimal key-distillation strategy in Section 4.4. 


For both source and channel models, the performance of a (2”*, n) key-distillation 
strategy S,, is measured in terms of the average probability of error 


PAS,) Ê P[K # |S, |, 
in terms of the information leakage to the eavesdropper, 
L(S,) = 1(K; Z” A"B"|S,), 
and in terms of the uniformity of the keys, 
U(S,) = log[2"*] — H(KIS,). 


Note that, by definition, U(S,) > 0 with equality if and only if the key is exactly 
uniformly distributed. 


Remark 4.2. Jt is possible to combine the information leaked to the eavesdropper and 
the uniformity of the key into a single quantity called the security index and defined as 


log[2”*] — H(K|Z” A" B” S,). 


The security index is equal to zero if and only if the key is uniformly distributed and 
unknown to the eavesdropper. However, we choose to study U(S,,) and L(S„) indepen- 
dently to emphasize that these are different constraints. 


Definition 4.3. A weak secret-key rate R is achievable for a source or channel model 
if there exists a sequence of (2"* , n) key-distillation strategies {Sn }n>1 such that 


lim P.(S,)= 0 (reliability), (4.1) 
1 

lim —L(S,,) = 0 (weak secrecy), (4.2) 
n>œ Nn 
1 

lim -U(S,) = 0 (weak uniformity). (4.3) 
n>on 


The corresponding keys are called weak secret keys. If the strategies {Sy}n>1 exploit 
public messages sent either from Alice to Bob only or from Bob to Alice only, the secret- 
key rate R is said to be achievable with one-way communication; otherwise, R is said 
to be achievable with two-way communication. 


Condition (4.1) means that Alice and Bob should agree on a common key with high 
probability. Condition (4.2) requires that Eve, who has access to information through 
her randomness Z” and the messages exchanged over the public channel A” B”, obtains 
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a negligible rate of information about the key; this condition is tantamount to the full 
secrecy rate condition studied in the context of wiretap channels. Finally, Condition (4.3) 
requires the key rate to be almost that of a uniform key, which is a necessary property of 
secret keys if they are are to be used to protect messages with a one-time pad as seen in 
Theorem 3.1. As already discussed in Section 3.3, Conditions (4.2) and (4.3) are called 
“weak” because they impose constraints on the rate of information leaked and on the 
rate at which the entropy of a key approaches that of a uniform distribution. 


Remark 4.3. In principle, we could attempt to characterize a rate—equivocation region 
for keys and consider partially secret keys for which (1/n)E(K|Z"A'B’S,) > Re with 
Re < R; however, it is not clear what the cryptographic purpose of such keys would be. 
In addition, note that partially secret keys can be obtained by expanding secret keys 
with a known function or by adding known bits; therefore, without loss of generality, we 
restrict ourselves to the analysis of secret keys as in Definition 4.3. 


Before we investigate the fundamental limits of achievable weak secret-key rates, it is 
worth looking at how weak secret keys can be used. Consider a (2”*, n) key-distillation 
strategy S,, such that 


1 1 
P.(S;) < €, —L(S,) <S E, and —U(S,) <S € 
n n 


for some € > 0. Assume the resulting weak secret key K is used in the one-time-pad 
encryption of a message M e€ XK that is independent of all variables involved in the 
key-generation process. Since K is not exactly uniform and not totally unknown to the 
adversary, we should not expect to obtain the perfect secrecy discussed in Chapter 3. In 
fact, note that 


I(M @ K, A", B”, Z”; M|S,) = I(A"B" Z”; M|S,,) + I(M @ K; M|A"B"2Z"S,) 
2 IM @ K; M|A’B’2Z"S,) 
2 HM @ KIA" B” Z" S,) — H(K|MA'B’ Z"S,) 
< log[2"*] — H(K|MA" B" Z” S„) 
H(K|S,) + ne — H(K|A’B"2"S,) 


= L(S,) + ne 


(d) 

< nd(e), (4.4) 
where (a) follows from I(A"B" Z”; M|S,,) = 0 since M is independent of A” B" Z”, 
(b) follows from H(M @ KIMA" B" Z”S,) = H(K|MA"B’2’S,,), (c) follows from 
(1/n)U(S,) < €, and (d) follows from (1/n)L(S,,) < €. Therefore, we can prove only 
that this encryption guarantees the weak secrecy condition 


1 
“I(M@K, A’, B”, Z"; M|S,„) < 8(6). 
n 
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This result is somewhat unsatisfactory, but it can be improved if we strengthen the notion 
of the achievable rate as follows. 


Definition 4.4. A strong secret-key rate R is achievable for a source or channel model 
if there exists a sequence of (2"* , n) key-distillation strategies {Sy}n> such that 


lim P.(S;) = 0 (reliability), (4.5) 
n->oo 
lim L(S,) = 0 (strong secrecy), (4.6) 
n—->oo 
lim U(S,) = 0 (strong uniformity). (4.7) 
n> 


The corresponding keys are called strong secret keys. 


The secrecy condition and the uniformity condition for strong secret-key rates differ 
from their weak counterparts in that they do not involve any normalization by n; hence, 
achievable strong secret-key rates are also achievable weak secret-key rates. Consider 
now a (2”*, n) key-distillation strategy such that 


P(S, ,) <€, L(S,)<e, and U(S,) <e, 


and assume as done earlier that the resulting strong secret key K is used for the one-time- 
pad encryption of a message M that is independent of all random variables involved in 
the key-distillation process. We can reiterate the calculation in (4.4) and the reader can 
check that we can then guarantee the strong secrecy condition 


I(M@K, A", B”, Z"; M|S,) < 8(€). 


Note that this still does not match the perfect secrecy condition, but does guarantee a 
reasonable secrecy level if € is small enough. 


Secret-key capacity of the source model 


In this section, we study the secret-key capacity, which is defined as the supremum 
of secret-key rates achievable for a source model. Since a secret-key rate is defined 
as a number of secret-key bits per realization of a DMS, the secret-key capacity does 
not account for the amount of communication required to distill keys, which could be 
arbitrarily large. 


Definition 4.5. The weak secret-key capacity of a source model with DMS 
(XYZ, pxyz) is 
C™ = sup{R : R is an achievable weak secret-key rate}. 


Similarly, the strong secret-key capacity of a source model with DMS (XYZ, pxyz) is 


CS = sup{R : R is an achievable strong secret-key rate}. 


When required, we write CS“(pxyz) inplace of CS" to explicitly specify the underlying 
distribution of the DMS under consideration. It follows directly from the definition of 
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achievable secret-key rates that C™ < CS“. We will show at the end of Section 4.3 that 
C™ = C™ but, until then, we restrict our attention to the weak secret-key capacity. 
Although this restriction will prove unnecessary, the study of weak secret-key capacity 
is less arduous than that of strong secret-key capacity and still provides useful insight 
into the design of key-distillation strategies. In addition, note that any upper bound we 
derive for CS” is automatically an upper bound for CS". 

A closed-form expression for the secret-key capacity for a general source model 
remains elusive. Nevertheless, it is possible to obtain simple upper and lower bounds 
that are useful in many situations. 


Theorem 4.1 (Maurer, Ahlswede, and Csiszar). The weak secret-key capacity of a 
source model (XYZ, pxyz) satisfies 


I(X; Y) — min(I(X; Z), I(Y; Z)) < CS" < min(I(X; Y), I(X, Y|Z)). 


Moreover, the secret-key rate I(X; Y) — min(I(X; Z), I(Y; Z)) is achievable with one-way 
communication. 


Proof. We provide proofs of the result in Sections 4.2.1, 4.2.2, and 4.2.3. The proof in 
Section 4.2.1 leverages the results obtained for WTCs in Chapter 3, whereas the proof 
in Section 4.2.2 provides a more direct approach based on Slepian—Wolf codes. 


The lower bound I(X; Y) — min(I(X; Z), ICY; Z)) can be understood as the difference 
between the information rate between Alice and Bob and some information rate leaked 
to Eve. However, in contrast to the results obtained in Chapter 3, Alice and Bob can 
choose whether the information rate obtained by Eve is leaked from Alice (I(X; Y)) 
or from Bob (I(Y; Z)). We will see in the course of the proof that this result stems 
from the possibility of two-way communication over the public channel. Before we 
prove Theorem 4.1, it is also useful to note that, in general, the bounds in Theorem 4.1 
are loose. In Section 4.3.1, we will see an example of a source model for which the 
lower bound is useless because I(X; Y) — I(X; Z) < 0 and I(X; Y) — I(Y; Z) < 0. In Sec- 
tion 4.2.4, we will provide an example of a source model for which I(X; Y|Z) > 0 and 
I(X; Y) > 0 while CS“ = 0. Nevertheless, there are several situations in which the bounds 
are tight. 


Corollary 4.1. Consider a source model with DMS (XYZ, pxyz): 


e if Z is independent of (X, Y), then C™ = I(X; Y); 
e ifX — Y > Z forms a Markov chain, then C™ = I(X; Y) — I(X; Z); 
e ifY —> X > Z forms a Markov chain, then C™ = I(X; Y) — I(Y; Z). 


Proof. If Z is independent of X and Y, then both bounds in Theorem 4.1 are equal to 
I(X; Y). If X — Y — Z forms a Markov chain then 


IX; Y|Z) = I(X; YZ) — I(X; Z) 
= 1(X; Y) + 1(X; ZIY) — 1(X; Z) 
= 1(X; Y) — 1(X; Z). 
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Finally, if Y —> X —> Z, then Z and Y are conditionally independent given X, and 
K(X; Y|Z) = I(XZ; Y) — ICY; Z) 
= I(X; Y) + I(Z; YIX) — ICY; Z) 
= I(X; Y) — 1(Y; Z). 


Remark 4.4. Jf we drop the term Z" in the information leaked L(S,) and if we use 
I(K; A"B"|S,) as the measure of secrecy, then the secret-key capacity is called the 
private-key capacity. The private-key capacity measures the maximum key rate with 
respect to an eavesdropper who observes communications over the public channel only 
and who disregards the realizations of the source Z". The private-key capacity can also 
be viewed as a special case of the secret-key capacity for a source model in which Z is 
independent of X and Y. 


Secret-key distillation based on wiretap codes 


In this section, we establish that the secret-key capacity is at least as large as I(X; Y) — 
min(I(X; Z), I(Y; Z)) by leveraging the results of Chapter 3 about the secrecy capacity 
of WTCs. The basic idea of the proof is to analyze a specific key-distillation strategy 
that creates a conceptual WTC, for which we know the existence of wiretap codes and 
achievable secure communication rates. 

We first show that I(X; Y) — I(X; Z) is an achievable secret-key rate. Assume that 
Alice wants to send a symbol u € ¥ that is independent of the DMS (1 YZ, pxyz) over 
the public channel. Instead of transmitting u directly, she observes a realization x € Æ 
of the source and transmits u ® x, where ® denotes addition modulo-|4| over V. At 
the same time, Bob observes y and Eve observes z. In effect, this operation creates a 
memoryless WTC with input U, for which Bob receives the pair of symbols (Y, U @ X) 
and Eve receives the pair of symbols (Z, U @ X). We know from Corollary 3.4 that the 
secrecy capacity of this WTC is at least 


I(U; Y, U@ X) — I(U; Z, U@ X) = H(UIZ, U@ X) — H(UIY, U® X), 


where the distribution pu over ¥ can be chosen arbitrarily. Here, we choose pu to be 
the uniform distribution over X. Then, 


H(UIZ, U @ X) = H(U, Ue XIZ) — H(U @ XIZ) 
= H(U|Z) + H(U ẹ X|U, Z) — H(U 6 XIZ). 


Since U is independent of (X, Z), H(U|Z) = H(U) and H(U @ XIU, Z) = H(X|UZ) = 
H(X|Z). Additionally, since U is uniformly distributed over ¥, the crypto lemma applies 
and H(U @ X|Z) = H(U). Therefore, 


H(U|Z, U ® X) = H(X|Z). 
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Similarly, we can show that H(U|Y, U @ X) = H(X|Y); therefore, 
I(U; Y, U @ X) — 1(U; Z, U@ X) = H(X|Z) — H(X|Y). 


Since the secret-key capacity is at least as large as the secrecy capacity of the conceptual 
WTC, we conclude that 


C > H(X|Z) — H(X|Y) = I(X; Y) — 1(X; Z). (4.8) 


Because the public channel is two-way, we can reverse the roles of Alice and Bob and 
create another conceptual WTC from Bob to Alice. Following the same arguments as 
above, we obtain the second lower bound 


C™ > H(Y|Z) — H(V[X) = I(Y; X) — I(Y; Z). (4.9) 
By combining (4.8) and (4.9) we obtain 
Co" > max(I(X; Y) — I(X; Z), IX; Y) — (Y; Z)) 
= I(X; Y) — min(I(X; Z), I(Y; Z)). 


Secret-key distillation based on Slepian—Wolf codes 


In this section, we rederive the achievable rates given by Theorem 4.1 with a more con- 
structive proof that is based on Slepian—Wolf codes. Although this alternative approach 
does not improve on the bounds already obtained, it is useful because it provides opera- 
tional insight into the design of key-distillation strategies without relying on the existence 
of wiretap codes. The use of Slepian—Wolf codes makes the derivation slightly more 
involved than that in Section 4.2.1, but it bears some similarity to the achievability proof 
of Theorem 3.2, which is based on the notion of an enhanced channel. 

In principle, the definition of key-distillation strategies allows many exchanges of 
messages over the public channel; however, to make the analysis tractable, we study 
simpler strategies. The first simplification consists of restricting our attention to strategies 
that exploit a single and one-way round of communication over the public channel and 
that do not rely on local randomness. If we assume that Alice is the one transmitting the 
public message, then such a strategy involves only 


e a single encoding function f : +" — A to compute the message a sent over the 
public channel; 

e Alice’s key-distillation function x, : 7" > K; 

e Bob’s key-distillation function kp : Y” x A> K. 


The second simplification consists of requiring Bob to decode Alice’s observation x” 
on the basis of his own observation y” and the public message a instead of computing 
k directly. We will show that these simple strategies suffice to achieve the rates in 
Theorem 4.1 by means of a random-binning argument; however, before we do so, 
it is useful to develop an additional desirable property that these strategies should 
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Figure 4.3 Enhanced source model for secret-key agreement. 


possess. We expand the information rate (1/n)L(S,,) leaked to the eavesdropper as 
follows: 


1 
~Ln) (K; Z"A|Sy) 
n 
1 
(KX"; Z"A|S,) — —I(X"; Z"AIKS,,) 
n 


1 1 
(X"; Z"A|S„) + -(K; Z"A|X"S,) — —I(X"; Z"A|KS,) 
n n 


1 
= —I(X";Z”AS,) — -I(X"; Z" A|KS,), 
n 


n 
where the last inequality follows from I(K; Z"A|X”"S„)=0 since K is a func- 
tion of X”. Note that, no matter how A is computed, the leakage is minimized 
if (1/n)I(X”; Z"A|KS,) is maximized, which happens if (1/n)H(X"|Z"AKS,,) is 
small. 
Therefore, we shall prove the existence of a sequence of (2”*, n) strategies {Sn }n>1 
with a single and one-way round of communication such that 


1 
lim P(S,)=0, lim —H(X"|Z" AKS,,) = 0, (4.10) 
n= n> n 
. 1 . 1 
lim —U(S,) = 0, lim -L(S,„) = 0. (4.11) 
n>œ n n>oon 


We now show that the two conditions in (4.10) can be combined into a single reliability 
constraint by considering the enhanced source model illustrated in Figure 4.3. This model 
enhances the original secret-key agreement problem by introducing a virtual receiver, 
hereafter named Charlie, who obtains the same observation Z” as Eve, overhears the 
message A over the public channel, and has access to K through an error-free side 
channel. 
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Definition 4.6. An (2”*, 2”*», n) strategy S, for the enhanced source model consists of 


two index sets K = 1, 2R] and A= 1, 2R 

an encoding function f : X” > A; 

a key-distillation function ky: X” > K; 

a decoding function g : Y" x A —> X" U {?} for Bob; 

a decoding function h : Z” x Ax K > X” U {?} for Charlie. 


Bob’s key-distillation function is implicitly defined as xp £ ka o g if g(y",a) £ ?. 
The reliability performance of a (2"*, 2”*», n) strategy S, is measured in terms of its 
average probability of error 


P.(S,) 4 p |R ZX" or X" £ X" |Sa], 
its secrecy performance is measured in terms of the leakage 
L(S,) = IK; Z"A|Sn), 
and the uniformity of keys is measured in terms of 
U(S,) = log[2"*] — H(K|S,). 


Note that a (2”*, 2”*», n) strategy S, for the enhanced source model is a (2”*, n) 
secret-key distillation strategy for the original source model, which is subject to a more 
stringent reliability constraint and for which we are controlling explicitly the rate Rp of 
communication over the public channel. By construction, the probability of error for the 
original source model is at most P.(S,,) since 


P [” m X"|S,] < pR” 4 X" or X" £ x"|S,] = P(S,). 


In addition, Fano’s inequality guarantees that (1/n)H(X”|AZ"KS,,) < 5(P.(S,,)). There- 
fore, the two constraints in (4.10) are automatically satisfied if lim„— oo P.(S,,) = 0 for 
the enhanced source model. 

We are now ready to develop a random-binning argument and show the existence of 
a sequence of (2”*, 2”%», n) strategies {S,,},>1 such that 


1 1 
lim P(S,)=0, lim -L(S,)=0, and lim -U(S,) = 0 
n> n>œ n nw n 


for some appropriate choice of R and Rp. 

Without loss of generality, we assume that H(X) > H(X|Z) — H(X|Y) > 0. Lete > 0 
and n € N*. Let R>0O and R, > 0 be rates to be specified later. We construct a 
(2%, 2%, n) strategy as follows. 


e Codebook construction. For each sequence x” € 7,"(X), draw two indices uniformly 
at random in the sets 1, 27} and [1, 2R; these index assignments define the 
functions f :&” —> |1, 2”®] and «a : ¥” > [1,2""], which are revealed to all 
parties. 

Alice’s encoder. Given an observation x”, if x” € 7”(X), set k = ka(x”) and a = 
f(x"); otherwise, set k = 1 anda = 1. 
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e Bobs decoder. Given an observation y”, output x” if it is the unique sequence such 
that (x", y”) e T? (XY) and f(x") = a; otherwise, output an error ?; if there is no 
error, distill a key Å = K(X"). 

e Charlies decoder. Given an observation z”, output x” if it is the unique sequence such 
that (%", z”) € T? (XZ), f(%") = a and k,(%”) = k; otherwise, output an error ?. 


The random variable that represents the strategy defined by the randomly chosen 
index assignments is denoted by S,. We proceed to bound the quantities E[P.(S,,)], 
{[(1/n)L(S,,)], and E[(1/n)U(S,,)] separately. 

The upper bound for E[P.(S,,)] is obtained with the approach used in Section 2.3.1 
for the Slepian—Wolf theorem. E[P.(S,,)] can be expressed in terms of the events 


Ey = {X" ¢ TEX) or X”, Y") ¢ TEXY}, 
E = {3x" 4 X" : f(x") = A and G", Y") € 77 OM}; 
E& = {3x" E X” : kax”) = K, f(x") = A and (x”, Z") € T.'(XZ)} 


since E[P.(S,,)] = P[& U € U E2]. By the union bound, we obtain 
E[Pe(S,)] < PIE] + PIE] + PIE], 


and, following exactly the same approach as that used in Section 2.3.1 to prove the 
Slepian—Wolf theorem, we can show that, if 


Ry > H(X|Y) + 6(€) and R+ Ry > H(X|Z) + 5(6), (4.12) 


then E[P.(S,,)] < 5<(”). 
Next, we develop an upper bound for E[(1/n)L(S,,)]. We expand E[(1/n)L(S,,)] as 


(K;AZ"|S,,) 


Sleasl/esl[e Sle 


1 
:| “us. 
n 
1 1 
Hi(K|S,,) + —H(AZ”|S,,) — —H(AKZ"|S,,) 
n n 


1 1 1 
H(K|S,) + —H(A|S,,) + —H(Z"|AS,,) — —H(AKZ"|S,) 
n n n 


IN 


(K|S,) + —H(AIS,) + HZ) — <HIAKZ"S,), (4.13) 


where the last inequality follows from H(Z"|AS,) < H(Z") = nH(Z). We bound the 
remaining terms on the right-hand side separately. By construction of a (2”*, 2”*», n) 
strategy, 


1 1 
—H(K|S,) < R+6(n), —H(A|S„) < Rp + ô(n). (4.14) 
n n 

In addition, 


1 1 1 
-H(AKZ"|S,) = -H(AKZ"X"|S„,)— —H(X"|AKZ"S,) 
n n n 


1 1 1 
—H(Z"X"|S,,) + —H(AK|X"Z"S,,) — —H(X"|AKZ"S,,) 
n n n 


1 1 
-H(Z”X”|S„) — —H(X"|AKZ"S,), 
n n 
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where the last equality follows from H(AK|Z”X"S,,) = 0 because A and K are functions 
of X”. By virtue of Fano’s inequality, 


1 1 
—H(X"|AKZ"S,,) = X` ps, (Sn) —H(X" AKZ" S,) 
n S, n 


1 
< So ps,(Sn) G + P(S.) lop} 
Sn 


= ô(n) + ER (S„)] logl® | 
= 6,(n). 


Therefore, 
1 1 
—H(AKZ2"|S,,) > —H(Z"X"|S,,) — 6.(n) = H(ZX) — 6(e). (4.15) 
n n 


On substituting (4.14) and (4.15) into (4.13), we obtain 


1 
7 [us] < R + Rp + H(Z) — H(ZX) + 6.(n), 
n 
for any R and R, satisfying (4.12). In particular, the choice 
Rp = H(X|Y)+ d(€) and R = H(X|Z) — H(X|Y) + ê(€) (4.16) 


is compatible with the constraints (4.12) and yields 


7 EJ = 5(€) + ôe(n). 


Finally, we develop an upper bound for E[(1/n)U(S,)] by establishing a lower bound 
for (1/n)H(K|S,,). Let us introduce the random variable &, such that 


afl ifX" e TX), 
~ 10 otherwise. 


163) 


By construction, & is independent of S„ and, by the AEP, P[S = 1] > 1 — ôe(n). Next, 
notice that, 


1 1 
—H(K|S,,) > —H(k|S, 2) 
n n 
1 
2 P[S = 1]-H(K|S,, & = 1) 
n 


1 
2(1- ôe(n))—-H(K|S,, a= 1). (4.17) 
n 
For a specific strategy S,,, define Ks, as the random variable with distribution 


A 
PKs, = PK|Sn=Sn,5=1- 


Then H(K|S, = Sn, & = 1) = H(Ks,), which can be written explicitly in terms of the 
probability px, as 


1 1 
—H(Ks,) = = J ks, (4) log ps, (K). 
kek 
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By virtue of the symmetry of the random-binning construction, the quantity 
US, |- Pxs, (k) log pks, (k)] is independent of k; therefore, 


1 1 
iS,  H(Ks,)| = =e Zs |- prs, (k) log pKs, (4)] 


= Ds [- Prs, (1) log Prs, (1)]. (4.18) 


Intuitively, because we use a random binning in which keys are assigned uniformly at 
random, we expect px, (1) to be on the order of 2~”* for most strategies S„. This idea 
is formalized in the following lemma, whose proof is relegated to the appendix to this 
chapter. 


Lemma 4.1. Let x be a function of a key-distillation strategy S, defined as 


1 if |pxs,(1)— 2-"8| < 27", 
Sn) = a 
xen) {6 otherwise. 
If R < H(X) — 4(€) then Ps,[x(Sn) = 1] > 1 — 6-(n). 


Using Lemma 4.1 with R as defined by (4.16), we then bound 
is, [—Pxs, (Dlog pks, (1)] as 


is, [—Pxs, (1) log px, (1)] 2 is, |—PKs, (1) log pxs, (L)Ix(Sn) = 1]Ps,[x(Sn) = 1] 


gnk 
> (1 — 6.(n))(1 — €)27-"* log ( ) : (4.19) 
l+e 
On combining (4.17), (4.18), and (4.19), we obtain 
1 
—H(K|S,) > R — 6(€), 
n 


and, therefore, 


|Zu] < 6.(n) + ô(€). 


By applying the selection lemma to the random variable S„ and the functions P., L, 
and U, we conclude that there exists a specific strategy S„ with rate R given by (4.16) 
and such that 


Pe(Sn) < 5e(n), “L(S;) < 5-(n) + 6(€), and TUS) < ôe(n) + d(€). 
Hence, there exists a sequence of (2”? , n) key-distillation strategies {S,}n>1 with rate 
R = H(X|Z) — H(X|Y) + d(€) = 1X; Y) — I(X; Z) + ê(€) 
and such that 


1 1 
lim P(S,)=0, lim —L(S,) < 6(€), and lim —U(S,) < é(e). 
n> n>oon n>œ n 


4.2.3 
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Since € can be chosen arbitrarily small, we conclude that the secret-key capacity is at 
least I(X; Y) — 1(X; Z). 

Note that, if I(X; Y) — I(Y; Z) > 0, we can reproduce the proof by swapping the roles 
of Alice and Bob and swapping X and Y in the equations. Therefore, the secret-key 
capacity is also at least I(Y; X) — I(Y; Z). 


Remark 4.5. The existence of a public channel of unlimited capacity in the model is 
convenient because it allows us to focus solely on secrecy without having to account 
for the cost of communication and processing; however, this approach has a subtle 
drawback. In the real world, there exists no channel of unlimited capacity, and the 
public channel used in the model would be obtained through multiple uses of a side 
channel with finite capacity. Consequently, if a key-distillation strategy requires many 
rounds of communication over the public channel, the side channel would have to be 
used many times. Hence, the effective secret-key rate, obtained by normalizing the key 
size by the number of random realizations of the sources plus the number of uses of 
the side channel, may be much lower than what is predicted by the results obtained 
thus far. This motivates the study of key-distillation strategies with rate-limited public 
communication, that is, key-distillation strategies for which the messages exchanged over 
the public channel are subject to a rate constraint Rp. The construction of strategies 
based on Slepian—Wolf codes described above does not allow us to control precisely the 
rate of messages sent over the public channel. Actually, in (4.16), we implicitly required 
the public rate R, to be at least H(X|Y) (or H(Y|X) if the roles of Alice and Bob are 
swapped). The key idea to handle rate-limited public communication is to construct a 
Slepian—Wolf-based strategy that operates on a quantized version of X” instead of on X” 
directly. The quantization allows us to control the rate of public communication and to 
adjust it so that it falls below the rate constraint R,. Specifically, one can combine the 
strategy described above with a Wyner-Ziv compression scheme, which can be thought 
of as a special case of vector quantization. We refer the reader to the bibliographical 
notes for further references. 


Upper bound for secret-key capacity 


In this section, we show that C®™ < min(I(X; Y), I(X; Y|Z)) with a converse argument. 
Let R be an achievable weak secret-key rate and let € > 0. For n sufficiently large, there 
exists a (2”*, n) key-distillation strategy S,, such that 


1 1 
P(S) <€, —-L(S,)<e¢, and —U(S,) <€. 
n n 


For clarity, we drop the conditioning on the strategy S, in all subsequent calculations. 
By virtue of Fano’s inequality, we have 


“1a (KIRA’B'Z") < ŽH(KIR) < s(R(S,)) < 80. 


128 


Secret-key capacity 


First, we show that R < I(X; Y|Z) + (€). Note that 
1 
R < —logf2”*] 

n 
1 1 

= n (K) + —U(S,) 
n n 
1 

< -H(K) + € 
n 
1 =. 1 

= —H(K|A’B’Z”) + —L(S,) + € 
n n 
1 

< —H(K|A’B’Z") + 8(e) 
n 
1 Koco 1 ‘ 

— (KRA wz) $ -H(kikarB"z") + 8(€) 
n n 
1 n 

<Ż (K; RIa”B" z”) + 8(€) 
n 
1 À 

< -I(X"Rx; Y"Ry|A" B" Z”) + 6(€), (4.20) 
n 


where the last inequality follows from the data-processing inequality applied 
to the Markov chain K — X"RxB" > Y"RyA” > K. We upper bound 
(1/n)I(X" Rx; Y” Ry|A”" B" Z”) by using the following lemma. 


Lemma 4.2. Letr e N*.LetSéeS,TET,Ucu, W EV", andW' e W" be random 
vectors such that 


V; is a function of S and W'—!, 
W; is a function of T and V'~!. 


Then, I(S;T|V’W'U) < I(S; TJU). 


vi € [1,7] i 


Proof. We upper bound I(S; T[V"W"U) as follows: 
I(S; T1V"W'U) < I(SV,; TIV" twU) 
< I(SV,; TW,|V"!W’'U) 
= I(S; TW, VW U) + I(V,; TW, 1SV wU) 
= I(S; TIV WU) + I(S; WIV" Ww" TU) 
+ I(V,; TW, SV wtu). (4.21) 
Since V, is a function of S and W"! and W, is a function of T and V’~!, note that 
I(V,;TW,|SV"'W"'U) =0 and I(S; WTV WwW tU)=0. (4.22) 
On substituting (4.22) into (4.21), we obtain 
KS; TIV"W"U) < I(S; Tv 'w"tuU). 
By induction over r, we conclude that I(S; TIV”"W”" U) < I(S; TIU). 
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By using Lemma 4.2 with S £ X"Ry, T £ Y"Ry, U £ Z”, V" £ A", and W” SB’, 
we obtain 


I(X”Rx; Y” Ry|A” B” Z”) < I(X” Rx; Y"Ry|Z"). 
Since Rx and Ry are independent of the DMS (XYZ, pxyz), we have 
I(X” Rx; Y"Ry|Z”) = I(X”; Y" |Z") = nI(X; Y| Z); 
therefore, 
I(X”Rx; Y"Ry|A"B’ Z”) < nI(X; Y|Z). (4.23) 
On substituting (4.23) into (4.20), we obtain 
R < I(X; YIZ) + ô(€). 


Finally, we show that R < I(X; Y) + ô(€). Note that, by assumption, 
1 1 ae 1 
—I(K; A"B") < —I(K; AB” Z”) = —L(S,) < d(e). 
n n n 


Therefore, all the steps leading to (4.20) and (4.23) can be reapplied without condi- 
tioning on the variable Z”, which yields R < I(X; Y) + ô(€). Since € > 0 can be chosen 
arbitrarily small, we obtain the desired upper bound 


C™ < min(I(X; Y), I(X; Y|Z)). 


Alternative upper bounds for secret-key capacity 


In general, the upper bound CS < I(X; Y|Z) established in Theorem 4.2 is loose, 
but the cause of this looseness is buried in the technical details of the proof; hence, 
it is worth developing an intuitive understanding of the bound before we try to 
improve it. 

We start by showing that, for any source model, the quantity I(X; Y|Z) admits an oper- 
ational interpretation: it is the secret-key capacity obtained by providing Bob with an 
explicit advantage over Eve. In fact, consider a source model with DMS (1 YZ, pxyz) 
and secret-key capacity C$“(pxyz). Assume we provide an advantage to Bob by giv- 
ing him access to Eve’s observation Z, which creates a new source model in which 
Bob observes Y’ = (YZ) instead of Y. Since a key-distillation strategy for the original 
source model remains a key-distillation strategy for the new source model, it holds 
that CS“(pxy'z) > CS“(pxyz); however, because X — Y' —> Z forms a Markov chain, 
Corollary 4.1 applies and 


C(pxy'z) = 1(X; Y'IZ) = I(X; Y|Z). 


On the basis of this operational interpretation, a natural approach to improve the bound 
CS"(pxyz) < 1(X; Y|Z) is to reduce the advantage given to Bob. 

A first possibility to mitigate Bob’s advantage is to analyze more precisely how 
Eve could further process her observations. Specifically, consider a key-distillation 
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strategy S, for a source model with DMS (XYZ, pxyz), with key K and public mes- 
sages A”B”. We allow Eve to send her observations Z” through an arbitrary DMC 
(Z, pzz, Z), which results in a new source model with DMS (XYZ, pxyz). Because 
KA'B’ —> Z” — Z” forms a Markov chain, the data-processing inequality ensures 
that 


1 - 1 = 1 
-I(K; A"B"Ž”|S„) < —I(K; A’B’Z"|S,) = -L(S,). 
n n n 


Hence, we have that K is also a secret key for the new source model and C™(pxyz) < 
CS"(pxyz). By virtue of Theorem 4.1, we also have C$“(pxyz) < I(X; Y|Z). Since the 
DMC (Zz » P2z\z> Z) is arbitrary, it must hold that 


C(pxyz) < inf 1(X; |Z). 
P2iz 
This inequality motivates the definition of a new measure of information. 


Definition 4.7. Fora DMS (XYZ, pxyz), the intrinsic conditional information between 
X and Y given Z is 


I(X;Y} Z) = inf 1(X;Y|Z). 
Pz\z 


Intuitively, the intrinsic conditional information measures the information between 
X and Y that remains after Eve has chosen the best memoryless processing of the 
observation Z. The following theorem shows that the intrinsic conditional infor- 
mation is an upper bound for the secret-key capacity that is at least as tight as 
Theorem 4.1. 


Theorem 4.2 (Maurer). The secret-key capacity of a source model with DMS 
(XYZ, pxyz) satisfies 


CY < IX; Y4 Z) < mind; Y), IX; YIZ)). 


Proof. The inequality CS" < inf p; , 1(X;Y|Z) has already been proved in the para- 
graphs above. To establish the second inequality, note that I(X; Y 4 Z) < I(X; Y|Ž) 
for any choice of transition probabilities pz)7; therefore, we prove that I(X;Y | Z) < 
min(I(X; Y), I(X; Y|Z)) by constructing specific DMCs (Z, pz,7,2) for which 
1(X; Y|Z) takes the values I(X; Y) and I(X; Y|Z): 


e if we set Z = Z and P2\z(@|z) = 1/|2Z| for all Z, then Z is independent of X, Y, and 
Z so that I(X; ¥|Z) = I(X; Y); 

e if we set p3)7(2|z) = 1 — z), then Z = Z with probability one and 1(X; Y|Z) = 
I(X; YIZ). 


Therefore, 


I(X;Y | Z) < min(II(X: Y), (X; YIZ)). 


The following example shows that Theorem 4.2 is useful because I(X; Y | Z) can be 
strictly tighter than I(X; Y| Z). 
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Example 4.1. Let ¥ = Y = {0, 1, 2, 3} and let pxy be defined in the table below. 


Y 
x 0 1 2 3 
0 1/8 | 1⁄8 | 0 0 
1 1/8 | 1/8 | 0 0 
2 0 0 |} 14] 0 
3 0 0 0 | 1/4 


Define Z as 


z2 X®Y if Xe{0,1}, 
-xX if Xe {2,3}. 


One can verify that I(X; Y) = 3 and I(X; Y|Z) = L, The particularity of this DMS is that, 
for X € {0, 1}, individual knowledge of Z or Y does not resolve any uncertainty about 
X but joint knowledge of Z and Y determines X completely. Consider now the channel 


(Z, Pžiz, Z), such that 


1 
Pzz(010) = p22 (011) = pžz(110) = pžz(l11) = zi 


Pžz(l2) = pz\z3)3) = 1. 


Notice that, if Z € {0, 1}, then Z is obtained by sending Z through a BSC with cross-over 
probability 5 and, therefore, Z becomes independent of Z. As a result, for X € {0, 1}, 
knowledge of Z still does not resolve any uncertainty about X, but knowledge of both Z 
and Y does not resolve any uncertainty, either. Hence, 1(X; Y| Z) = 0 and, consequently, 
I(X;Y 4 Z) = 0 and C™ = 0. 


A second possibility to improve the result of Theorem 4.2 is not only to analyze 
how Eve could further process her observations but also to provide her with some side 
information represented by a correlated DMS (U, pu); however, the side information 
must be introduced carefully if we want to retain an upper bound for CS“(pxyz) because, 
in general, the secret-key capacity is reduced if Eve has access to side information. 


Proposition 4.1. Consider a source model with DMS (¥YZ, Puy?) in which the 
eavesdropper has access to the observations Z = (Z, U) € Z x U. Then, 


CS (Pxyz) > Ce"(pxyz) — H(U). 


Proof. We prove the inequality by constructing a key-distillation strategy for the DMS 
(XYZ, pyyz) from a key-distillation strategy for the DMS (XVZ, pxyz). 

Let R be an achievable weak secret-key rate for the DMS (1) Z, pxyz). For any 
€ > 0, there exists a (2”*, n) key-distillation strategy S„, such that 


P.(S,) < 5(€), “L(S,) < 56), and US.) < 6). (4.24) 
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Using Fano’s inequality, we obtain 
1 A 
—H(KIKSy) < 5(R(Sy)) < 5(6). (4.25) 
n 

In addition, 


1 oe 1 , 1 
~I(K; AB’ Z"U"|S,) = —I(K; A’ B’Z"|S,) + —I(K; U"|A’ BB" Z"S,) 
n n n 
1 1 
< —I(K; A”B’2Z"|S,) + —H(U") 
n n 
1 
< -L(S,) + HU) 
n 


< 8(€) + HU). (4.26) 


We now turn our attention back to the DMS (XYŽ, pyyz). We construct a key- 
distillation strategy by first running m independent repetitions of the key-distillation 
strategy Sn, from which Alice obtains i.i.d. sequences K”, Bob obtains i.i.d. sequences 
kK", and Eve observes A’”, B’”, Z””, and U””. Effectively, this creates a source model 
with DMS (4'Y’2’, px-yz') in which X’ £ K, Y’ £ k, and Z’ £ A"B” Z”U”., Since a 
key-distillation strategy for the DMS (A’)'Z’, px'v'z') is a specific instance of key- 
distillation strategy for the DMS (” VŽ, pxy z), we have 


1 
c$ (pxyż) 2 zO Pxvz), (4.27) 


where the normalization by n appears because each realization of the DMS 
(X'Y'Z", pxv'z) relies on n realizations of the DMS (XYZ, Bae) Note that Theo- 
rem 4.1 guarantees that 


CMe > XY] -1(X5Z/). 
Next, we use (4.24), (4.25), and (4.26) to lower bound I(X’; Y’) — 1(X’; Z’) as 
1(X'; Y) — 1(X'; Z’) = I(K; RIS.) — K; A'B'Z"U"|S,) 
= H(KIS,) — H(KIKs, ) — I(K; AB’ Z"U"|Sy) 
> nR — nH(U) — nd(e). 
All in all, we obtain 
CO"(Pxyz) > R — HU) — 5). 


Since € > 0 can be arbitrarily small and R can be chosen arbitrarily close to CS“(pxyz), 
we get the desired result 


c$" (pxyžż) = C$" (pxyz) — H(U). 


Proposition 4.1 means that providing Eve with side information U reduces the secret- 
key capacity by at most H(U). This motivates the definition of another measure of 
information. 
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Definition 4.8. Fora DMS(X YZ, pxyz), the reduced intrinsic conditional information 
between X and Y given Z is 


LOGY LZ) Ê inf (UX; ¥ | ZU) + HU). 
Uu|XYZ 


Intuitively, the reduced intrinsic conditional information measures the information 
between X and Y that remains after the best memoryless processing of Z and with 
the best memoryless side information U; however, the term H(U) is introduced to 
compensate for the decrease of I(X; Y | Z) caused by the side information. The next 
theorem shows that I(X; Y || Z) is an upper bound for C" that is at least as good as 
WX; YJ Z). 

Theorem 4.3 (Renner and Wolf). The secret-key capacity of a source model with DMS 
(YYZ, pxyz) satisfies 


C < IX; YUZ < 1X5) Z). 


Proof. The inequality I(X; Y || Z) < I(X; Y Ļ Z) follows on choosing U = 0 in the defi- 
nition of I(X; Y || Z). The inequality CS" < I(X; Y || Z) follows on noting that, for any 
U, Proposition 4.1 and Theorem 4.2 ensure that 


CS"(pxyz) < Ce"(pxyzuy) + H(U) < 1x; Y 4 ZU) + HU) 
and, therefore, 


cs" (pxyz) < Eon (U(X; Y4 ZU) + HU) = KX; Y || Z). 


As shown in the following example, the result of Theorem 4.3 is useful because 
I(X; Y || Z) can provide a better bound than can I(X; Y | Z). 


Example 4.2. Let ¥ = YV = {0, 1, 2, 3} and let us consider again the joint probability 
distribution pxy defined in the table below. 


Y 
x 0 1 2 3 
0 1/8 | 1⁄8 | 0 0 
1 1/8 | 1⁄8 | 0 0 
2 0 0 | 14] 0 
3 0 0 0 | 1/4 


Let Z be defined as 


a [XBY if Xe {0, 1}, 
~ |X mod2 if Xe {2,3}. 


In contrast to Example 4.1, knowledge of Z never fully resolves the uncertainty about X or 
Y. One can verify that I(X; Y) = I(X; Y|Z) = 3. We now introduce the side information 


Uas 
ual 
z 
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Let us consider the channel (Z x U, pz)z7u, {0, 1, 2}) such that 


P2)zu2|0, 0) = pžįzu(2l1, 0) = pz)zu 010, 1) = pžzu(lll, 1). 


One can check that 1(X; y|Z) = 0 and therefore I(X; Y || Z) = 0 as well. 
To establish that I(X;Y | Z) > 0, consider an arbitrary channel (Z » P2z\z> Z) and 
consider an output symbol Z*. Let 


A 


P= pzz(2"|0) and q4 = psiz7(2"|1). 


One can check that 


= = m =$ P 
pxyız(0, 0|Z*) = pxyız(0, 1|Z*) = pxvyız(l, 0|Z*) = pxviz(l, 179 = ———., 
4(p +q) 
7 = P 
xy|z(2, 2|Z*) = pxy\z(3, 3|Z*) = ———_., 
EN ! 2(p +q) 


and I(X; Y|Ž = 7”) = 3. Consequently, 1(X; Y|Z) = 3 as well, and, since the channel 
(Z, Pžiz» Z) was arbitrary, I(X; Y | Z) = 3, 


Note that neither the intrinsic conditional information nor the reduced intrinsic condi- 
tional information really helps determine a generic expression for CS”. 


Sequential key distillation for the source model 


The analysis of wiretap codes and key-distillation strategies is complex because messages 
or keys are subject to simultaneous reliability and secrecy constraints. Random-coding 
and random-binning arguments allow us to circumvent this difficulty and to establish 
achievability results, but they provide limited insight into the design of practical schemes. 
Even the proof in Section 4.2.2, which exploits a strong connection between key- 
distillation strategies and Slepian—Wolf codes, implicitly requires the key-distillation 
function and the public-message-encoding function to be designed jointly; this joint 
design makes it difficult to find such functions in practice. Hence, to further simplify 
the design of practical schemes, it is legitimate to wonder whether one could handle the 
reliability and secrecy requirements independently. This makes little sense for wiretap 
codes, but the idea is not totally contrived for key-distillation strategies because keys 
are random sequences that do not carry any information by themselves; there is a lot of 
leeway in the construction of key-distillation strategies and Alice and Bob are free to 
remove, combine, or shuffle their observations. 

In this section, we show that, for a source model, it is indeed possible to design key- 
distillation strategies that handle reliability and secrecy independently. Such strategies, 
which we call sequential key-distillation strategies because they operate in sequential 
phases, play a particularly important role for three reasons: 


e they incur no loss of optimality, since they can achieve all rates below the secret-key 
capacity; 


4.3 Sequential key distillation 135 


information 
he advantage information 
Corl aon reconciliation 
Pa aS privacy 
amplification 
m m m m 
EB -PẸ EE “EB 
z Am z Mjm ž MIm 4 AO} = 


Figure 4.4 Evolution of information during the phases of a sequential key-generation strategy. 


e they achieve strong secret-key rates, which allow us to prove C™ = C&; 
e their analysis eventually leads to explicit and practical constructions. 


Specifically, a sequential key-distillation strategy is a key-distillation strategy that oper- 
ates in four successive phases. 


1. Randomness sharing. Alice, Bob, and Eve observe n realizations of a DMS 
(XYZ, Pxyz). 

2. Advantage distillation. If needed, Alice and Bob exchange messages over the public 
channel to process their observations and to “distill” observations for which they 
have an advantage over Eve. 

3. Information reconciliation. Alice and Bob exchange messages over the public channel 
to process their observations and agree on a common bit sequence. 

4. Privacy amplification. Alice and Bob publicly agree on a deterministic function they 
apply to their common sequence to generate a secret key. 


Before we describe and analyze these phases precisely, it is useful to understand intu- 
itively the role played by each of them in the key-distillation strategy. Figure 4.4 illustrates 
the evolution of each party’s information about Alice’s initial source observations dur- 
ing the different phases. The amount of information is represented qualitatively by the 
height of the bars in the figure. After the randomness-sharing phase, we assume that Eve 
has an advantage over Bob; hence, Bob’s information is lower than Eve’s information. 
During the advantage-distillation phase, Alice and Bob interact over the public channel 
to distill observations for which they have an advantage over Eve. Since the observations 
for which Eve has an advantage are discarded, Alice’s information decreases; however, 
Bob’s information now exceeds Eve’s information. During the information-reconciliation 
phase, Alice provides Bob with side information that enables him to correct all the dis- 
crepancies between his sequence and Alice’s; as a result, Bob’s information increases 
to reach the level of Alice’s, but, since the error-correction information is public, Eve’s 
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Figure 4.5 Satellite source model. 


information increases as well. Finally, during the privacy-amplification phase, Alice and 
Bob generate a smaller sequence about which Eve has no information. 


Advantage distillation 


For some source models, the lower bound for the secret-key capacity provided by 
Theorem 4.1 is negative, and hence useless. Figure 4.5 illustrates such a source model, 
which is commonly called a “satellite” source model. It consists ofa DMS (U, pu) with 
u~ BG) that broadcasts sequences of bits to Alice, Bob, and Eve through independent 
binary symmetric channels with respective cross-over probabilities p > 0, q > 0, and 
r > 0. This source model can be thought of as a satellite transmitting to three base 
stations on Earth, one of which is an eavesdropper. It is assumed that Eve’s cross-over 
probability r satisfies r < p andr < q so that 


I(X;Y) < I(X;Z) and I(X;Y) < I(Y; Z). 


In other words, Eve has an advantage over both Alice and Bob because the mutual 
information between Z and X and the mutual information between Z and Y are higher 
than that between X and Y. 

The basic premise of advantage distillation is that Alice and Bob may reverse Eve’s 
advantage by exchanging messages over the public channel. In fact, the mutual infor- 
mation I(X; Z) or I(Y; Z) measures only Eve’s average advantage over Alice and Bob. 
Although I(X; Y) < 1(X; Z) and I(X; Y) < I(Y; Z), there may exist some realizations of 
the DMS for which Eve’s observations are loosely correlated to Alice and Bob’s. One 
can think of an advantage-distillation protocol as a procedure to distill the realizations 
for which Alice and Bob have an advantage over Eve. Formally, an advantage-distillation 
protocol is defined as follows. 


Definition 4.9. An advantage-distillation protocol D, for a source model with DMS 
(XYZ, pxyz) consists of 


two alphabets X' and Y'; 

a source of local randomness (Rx, PRx) for Alice; 

a source of local randomness (Ry, Pry) for Bob; 

an integer r € N* that represents the number of rounds of communication; 
r encoding functions f; : X” x Bi"! x Ry > Afori e [1,r]; 
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e r encoding functions g; : Y" x Ai“! x Ry > B fori e [1,7]; 
e a function b; : X” x BY x Rx > X'; 
e a function 0: Y" x A" x Ry > V'; 


and operates as follows: 


e Alice observes n realizations of the source x" while Bob observes y"; 

e Alice generates a realization ry of her source of local randomness while Bob generates 
ry from his; 

e in round i € [1,r], Alice transmits a; = fi(x", b'-|,r,) while Bob transmits bi = 
gio”, ad=! ry); 

e after round r, Alice distills x' = 0,(x", b" , rx) while Bob distills y’ = @(y", a”, ry). 


By convention, A? £ 0 and B° £ 0. The number of rounds r and the sources of local 
randomness can be optimized during the design of the advantage-distillation protocol. 
By repeating an advantage-distillation protocol multiple times, Alice and Bob distill the 
realizations of anew DMS (¥'V'Z', px-y'z') with components X’, Y’, and Z’ £ Z”A"B". 
Ideally, this new DMS provides Alice and Bob with an advantage over Eve in the sense 
that 


I(X;Y) > 1I(X;Z) or I(X;Y") S1(V;Z’). 


Hence, it is natural to measure the performance of an advantage-distillation protocol in 
terms of the quantity 


R(D,) È 1 max(1(X; Y) — 1X; Z’), 1X; Y’) — (V2), 
n 


which we call the advantage-distillation rate? Notice that we have introduced a normal- 
ization by n to express the rate in bits per observation of the DMS (XYZ, pxyz). The 
advantage-distillation rate captures an inherent trade-off in the design of an advantage- 
distillation protocol. On the one hand, Alice and Bob want to exchange messages to 
maximize I(X’; Y’), that is, they want to extract the observations for which their real- 
izations are highly correlated; on the other hand, they must also minimize I(X’; Z’) or 
I(Y'; Z’), that is, they must choose their messages carefully in order to avoid revealing 
the values of their observations to Eve. 


Definition 4.10. An advantage-distillation rate R is achievable if there exists a sequence 
of advantage-distillation protocols {Dy}n>1 such that 


lim R(D,) > R. 

n—> oo 
Definition 4.11. The advantage-distillation capacity D™ of a source model with DMS 
(XYZ, pxyz) is 


D™ £ sup{R : R is an achievable advantage-distillation rate}. 


2 Our definition allows an advantage-distillation rate to take negative values, which of course has little 
interest. 


138 


Secret-key capacity 


When required, we write DS“(pxyz) in place of D™ to specify explicitly the distribu- 
tion of the DMS. We do not attempt to characterize the advantage distillation of a source 
model exactly; rather, we relate it to the secret-key capacity of the same source model. 


Proposition 4.2 (Muramatsu). For a source model with DMS (XYZ, pxyz), 
D™ = cM. 


Proof. A (2"* , n) key-distillation strategy for a source model with DMS (VV Z, pxyz) 
can be viewed as an advantage-distillation protocol for which X’ = K, Y’ = k, and 
Z' = A’B’Z". If a secret-key rate R is achievable, there exists a sequence of key- 
distillation strategies {S,}n>1 such that 


1 1 
lim P.(S,)=0, lim -L(S,)=0, and lim -U(S,)= 0. 
R-+00 n> n n> n 
By virtue of Fano’s inequality, 
1 5 
lim -H(K|KS,) < lim ô(P.(S,)) = 0. 
n> n n—> o0 
and, from the definition of L(S,,), 
1 1 
lim —I(K; Z”A"B"|S,) = lim —L(S,) = 0. 
n>œ n n>oon 


Hence, the sequence of advantage-distillation rate {R(S,,)},>1 satisfies 


lim R(S,) = lim (S107) = 10x; z!) 
n—=> o0 n>œ \ n n 
S jin (Fais) - *H(KIRS,) = TUK:Z"A'B'IS;)) 
n>oo \n n n 


II 
5 


1 1 1 : 1 
( log[2"?] — —U(S,) — —H(KIKS,) — (K; Z"A'B" s») 
n n n n 


Therefore, R < D™; and, since R is an arbitrary achievable secret-key rate, C™ < D™. 

To prove the reverse inequality, notice that repeated application of an advantage- 
distillation protocol D, creates a source model with DMS (4’)'Z', px'y'z'), whose 
secret-key capacity is CS“(px'yz’); by Theorem 4.1, C™(pxv' z) satisfies 


C™(pxv z) > max(I(X'; Y’) — I(X'; Z’), I(X;Y) — I(Y';Z')). 


The secret-key capacity CS“(px-yz’) is expressed in bits per observation of the source 
(X'Y'Z", pxy'z'); hence, the corresponding secret-key rate in bits per observation of 
the source (YYZ, pxyz) is (1/n)CS“(px-yz') and, by definition, it cannot exceed 
CS"(pxyz). Therefore, 


1 
CS"(pxyz) > Cs (Pxviz') 
> L max(I(X; Y) — 1(X'; Z’), I(X;Y’) - I(Y'; Z’) 
n 
= R(D,). 
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Since the advantage-distillation rate R(D,,) can be arbitrarily close to D™(pxyz), we 
obtain 


Ce"(pxyz) > D™(pxyz). 


Proposition 4.2 does not provide an explicit characterization of the advantage- 
distillation capacity, but it shows that the secret-key capacity is equal to the maximum 
rate at which Alice and Bob can “generate an advantage” over Eve. This formalizes the 
intuition that the amount of secrecy that Alice and Bob can extract from their observations 
is related to the advantage they have over Eve. In addition, the proof shows that there is 
no loss of optimality in starting a key-distillation strategy with an advantage-distillation 
phase. 

Unfortunately, there is no known generic procedure for designing advantage- 
distillation protocols because the distillation heavily depends on the specific statistics of 
the underlying DMS (XYZ, pxyz). Nevertheless, we illustrate the concept by analyz- 
ing a protocol for the satellite scenario of Figure 4.5. This protocol, which is called the 
“repetition” protocol, is due to Maurer and operates as follows. 


1. Alice, Bob, and Eve observe m realizations of the DMS denoted by X”, Y”, and Z”, 
respectively. 

2. Alice generates a bit V ~ B(5). 

3. Alice creates a vector V” £ (V,..., V) that consists of m repetitions of the same bit 
V, and transmits the bit-wise sum V” @ X” over the public channel. 

4. Upon reception of V” © X”, Bob uses his observations Y” to compute V” © X” ® 
Y” and defines 


0 if V” xX” @®Y” =(0,0,...,0), 
Y241 if V>@x™@y" =(1,1,..., 1), 
? else. 


Bob then sends Alice a message F over the public channel defined as 


i 


0 if Y=?. 
Notice that F carries information about Y but does not reveal its exact value. 
5. Upon reception of F, Alice defines X as 


ei [V if FS, 
i Trs 


By reiterating the repetition protocol multiple times, Alice and Bob effectively cre- 
ate a new DMS (4’)’2’, px-yz’) with components X’ £ XF, Y' ê Y, and Z ê 
(Z”, V” ® X”, F). The protocol can be thought of as a post-selection procedure with 
a repetition code, by which Alice and Bob retain a bit only if their observations X” 
and Y” are highly correlated or anticorrelated. Since Eve is a passive eavesdropper and 
has no control over the post-selection, she cannot bias Alice or Bob towards selecting 
observations that would be favorable to her. 

We now compute I(X’; Y’) and I(X’; Z’) obtained with the repetition protocol in order 
to characterize its advantage-distillation rate. 
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Proposition 4.3. The mutual information between Alice and Bob after advantage dis- 
tillation with the repetition protocol is 


1(X’; Y") = Hy(@) + æ (1 — Hy ()), 
with 
(pq + pq)” 
(pq + pq)” + (pq + pq)” 


a= (p3 + pq)" + (B4 + pq)", BF 


and p = (1 — p) and g = (1 — q). 


Proof. The mutual information between Alice and Bob after the repetition protocol can 
be written 


I(X'; Y") = I(XF; Ý) = I(F; Y) + I(X; VIF). 
By construction, H(FIY) = 0 and 1(X; Y\F = 0) = 0; therefore, 

1(X’; Y') = HF) + P[F = 1J1(X; Y|F = 1). (4.28) 
We compute each term on the right-hand side of (4.28) separately. By construction, 


P[F = 1]is the probability that Bob obtains Y € {0, 1}, which can be computed explicitly 
as 


a Ê P[F = 1] = P[X” ẹ Y” =0 or x” @Y" =1] 
= (pq + pq)” + (bq + pq)”. (4.29) 
The probability that Y differs from X given F = 1 is simply the probability that X” ® 
y” = ] given F = 1; hence, 
ey (pq + pq)” 
pSP(X4Y|F=1] = — ——— —, (4.30) 
| (pq + pq)” + (Pq + pã) 
The conditional probabilities py,z- are those of a BSC with cross-over probability 
P[X # Y|F = 1]; therefore, 


1(X; Y|F = 1) = 1 — H(A). (4.31) 
On combining (4.29), (4.30), and (4.31) in (4.28), we obtain 
1(X’; Y’) = Hy (@) + a (1 — H(A). 


Proposition 4.4. The mutual information between Alice and Eve after advantage distil- 
lation with the repetition protocol is 


1(x;z) =m) +a (7) (1 m ( a )). 
k=0 


Pk + Pm-k 


with a = (pq + pq)” + (pq + pg)” and 


m—k 


i or Ive Ser eee 
Pk Ê — (Pār + par) (pqr + pqry"* + z (par + pF} (Bq? + par) 
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Proof. Because Eve may not perform any hard decision on the bit V, evaluating Eve’s 
probability of error? does not suffice to compute I(X'; Z’). Nevertheless, the satellite 
source model and the repetition protocol are simple enough that we can compute I(X’; Z’) 
in closed form. Using the chain rule, 


1(X’; Z’) = I(XF; Z”, V” @ X”, F) 
= H(XF) — H(XF|Z”, V” @ X”, F) 
= H(F) + H (XIF) — H(X|Z”, V” @ X”, F). (4.32) 


Notice that the pair (Z”, V” ® X”) uniquely determines (Z” , V” © X” @ Z”) and vice 
versa, so that H(X|Z”, V” © X”, F) = H(X|Z”, V” © X” ® Z”, F). In addition, Z” 
and X” are observations of the same i.i.d. sequence U” through indendent BSCs. Hence, 
we can write Z” = U” @ E” and X” = U” @ E”, where E” and E” are the independent 
error patterns introduced by the BSCs. Consequently, 


H(XIZ”, V” @ X”, F) 
IZ", V” @ X” © Z", F) 
U” @ E”, V” @ E” @ E”, F) 
X, U” @ E7 |V" © E” @ E”, F) - H(U”  E”|V" @ E? @ E”, F) 
= H(X|V" @ E” @ E”, F) + H(U” @ E” |V” @ E” @ E”, F, X) 
— H(U” @ E” |V” @ E” @ E”, F). 


= H(X 
= H(X 
= H( 


Since the sequence U” is uniformly distributed, the crypto lemma ensures that 
H(U” @ E”|V" @ E” @ E”, F, X) = H(U” @ E” |V” @ E” @ E”, F) =m, 
and, therefore, 
H(X|Z”, V” @ X”, F) = H(X|V" @ E” @ E”, F). (4.33) 
On substituting (4.33) into (4.32), we obtain 
1(X’; Z’) = HF) + H(X|F) — H(X|V" @ E” @ E”, F) 
= H(F) + I(X; V” @ E” @ E”|F) 
= H(F) + P[F = 1]1(X; V” @ E” @ E”|F = 1), (4.34) 


where we have used I(X; Z”, V” @ X"|F = 0) = 0 to obtain the last equality. Given 
F = 1, note that X = V and that the weight W £ w(V” © E” @ E”) of the sequence 
V” © E” © E” is a sufficient statistic for X given V” @ E” @ E”. Hence, 


1(X’;Z') = H®) + PIF = 1J1(X; WIF = 1). 


3 If we were to compute the probability of error for Eve under maximum -likelihood decoding, Fano’s inequality 
would yield only a lower bound for I(X’; Z’). 
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Finally, all we need in order to compute 1(X; W|F= 1) is the joint distribution 
P[W, X|F = 1] . For any weight k € [0, m], 


P[W =k, X =0|F = 1] = P[w(V" @ E? @ E”) =k, X = 0|F = 1] 
[w (EY @ EZ) =k, V =0|F= 1] 

= P[w (E” @ E") = k|F = 1]P{[V = 0]. 
[ 


= P 


By construction, P[V = 0] = L, and P| w (Ez D E”) = k|F = 1] can be written as 


P[w(E" p E”)=k,F=1] 
P[F = 1] 


L (P[w(E" @ E”) = k, X” @ Y” = 0] + P[w(E” @ E”) = k, X"@ Y” = 1]) 
Q 


1 
= e ) var + pq?)(pqF + par)" +(7) (par + pq?) (BaF + ary") . 


Upon defining p; as 


Pr Ê “(par + pqi)\(pqF + par)” + “(par + pq?) (par + par)", 


we obtain 
5 l /m 
P[W =k, Š = 0|F = 1] = 5 (7) en (4.35) 
Similarly, 
. i 
P[W=k,X=1|F=1] = TOES (4.36) 


and, on combining (4.35) and (4.36), we have 


l/m 
P[W = k|F = 1] = 2 (") (Pk F Pm-—k)- 
We can now evaluate I(X'; W|F = 1) explicitly as 


m 


` P[W=kX=0,F=1] 
$ (PIw= e3 = 0 = 11e ( PIW- AFi] ) 


k=0 
5 P[W=k|X=1,F= 
+ P[W=k,X = 1|F = 1] log 


P[W=kF=1] 
il 1 /m 2Pk l/m 2Pm—k 
=. G (i) a = = = (i)e i i = 
= 5 H ) (Pr + Pm—K) + 3 DS (" ) (Pk + Pm—x) (-H,(—”* _ )) 
2 k Pk + Pm—k 


-y (7p) (1-55): 
Nk E 


4.3.2 
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Figure 4.6 Advantage-distillation rate for different values of the repetition-protocol parameter m. 
The parameters of the satellite source are p = q = 0.2 andr = 0.15. 


where the last equality follows because $ -zo (x ) =A (x ) Pm—k and 


m ( Pk ) = Hi (1 = Pk ) =H, ( Pm-k ) i 
Pk + Pm-k Pk + Pm-k Pk + Pm-k 


On substituting (4.29) and (4.13) into (4.34), we obtain the desired result 


Lon m m Pk 
1X2) = Hota) +a > (7) (i n(—2—)) 


Figure 4.6 illustrates the advantage-distillation rate ofthe protocol for a satellite source 
with p = q = 0.2 and r = 0.15 for various values of the repetition parameter m. Note 
that, on choosing m large enough, the protocol achieves a strictly positive advantage- 
distillation rate; however, the protocol is inefficient and the advantage-distillation rates 
are quite low. For instance, with p = q = 0.2, r = 0.15, and m = 3, the advantage- 
distillation rate is on the order of 0.005 bits per observation of the DMS. This rate is 
relatively low because the post-selection with a repetition code is extremely wasteful 
of source observations. It is possible to improve these rates by using slightly better 
codes, and we refer the reader to the bibliographical notes at the end of this chapter for 
additional details. 


Information reconciliation 


After the advantage-distillation phase, Alice, Bob, and Eve, obtain the realizations 
of a DMS (X'YV'Z', pxvw'z'). The objective of the information-reconciliation phase 
(reconciliation for short) is to allow Alice and Bob to agree on a common sequence S, 
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but, at this stage, the sequence S is not subject to any secrecy constraint. To simplify the 
notation, we assume that the reconciliation protocol operates on the DMS (V1 YZ, pxyz) 
instead of on the DMS (4')'Z’, px-y'z’) obtained with advantage distillation. 

We place few restrictions on how reconciliation should be performed. The common 
sequence S could be a function of Alice and Bob’s observations and of messages 
exchanged interactively over the public authenticated channel. Alice and Bob are also 
allowed to randomize their operations using sources of local randomness. Formally, a 
reconciliation protocol is defined as follows. 


Definition 4.12. A reconciliation protocol R, for a source model with DMS 
(XYZ, pxyz) consists of 


e an alphabet S = [1, S]; 

e a source of local randomness (Rx, Prx) for Alice; 

e a source of local randomness (Ry, Pry) for Bob; 

e an integer r € N* that represents the number of rounds of communication; 
e r encoding functions f; : X” x Bi"! x Ry > Afori e [lr]; 

e r encoding functions g; : Y” x A! x Ry > B fori € [l,r]; 

e qa function na : X” x B" x Rx > S; 

e a function ny: Y" x A x Ry > S; 


and operates as follows: 


e Alice observes n realizations of the source x" while Bob observes y" ; 

e Alice generates a realization r, of her source of local randomness while Bob generates 
ry from his; 

e in round i € [1,r], Alice transmits a; = f;(x", b'~!,r,) while Bob transmits b; = 
gly", a’, ry); 

e after round r, Alice computes s = n,(x",b",r,) while Bob computes $ = 
my”, a", ry). 


By convention, A? £ 0 and By £ 0. The number of rounds r and the sources of local 
randomness can be optimized during the design of the reconciliation protocol. Note that 
the definition of a reconciliation protocol is slightly different from that of an advantage- 
distillation protocol because the objective is not the same. The goal of the reconciliation 
protocol is to guarantee that Alice and Bob agree on a common sequence; hence, the 
output alphabets of the functions 7, and m are the same. The goal of an advantage- 
distillation protocol is merely to generate a new source; hence, the output alphabets of 
the functions 0, and 6, in Definition 4.9 could be different. 

The reliability performance of a reconciliation protocol is measured in terms of the 
average probability of error 


REJEP SAS]: 


In addition, since the common sequence S generated by a reconciliation protocol is 
eventually processed to generate a secret key, it is desirable that the protocol leaks as 
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little information as possible over the public channel. Hence, a reasonable measure of 
performance is the difference between the entropy of the common sequence H(S) and 
the amount of public information H(A”B”) exchanged over the public channel. The 
quantity 


A 1 rpr 
R(Ra) = z HGIR,) = H(A B IRa)) 


is called the reconciliation rate of a reconciliation protocol. The choice of this measure 
is fully justified when we discuss privacy amplification in Section 4.3.3. 


Definition 4.13. 4 reconciliation rate R is achievable if there exists a sequence of 
reconciliation protocols {Ry}n>1 Such that 


lim R(R,)=0 and lim R(R,) > R. 
n—->oo 


n—-> Oo 


Definition 4.14. The reconciliation capacity RS of a DMS (XY, pxy) is 
R™ © sup{R : R is an achievable reconciliation rate}. 


Proposition 4.5. The reconciliation capacity of a source model with DMS 
(XYZ, pxyz) is 


R“ = I(X;Y). 


In addition the reconciliation rates below R™ are achievable with one-way communi- 
cation and without sources of local randomness. 


Proof. We start by proving that the reconciliation capacity cannot exceed I(X; Y). Let 
R be an achievable reconciliation rate. By definition, for any € > 0, there exists a 
reconciliation protocol R, such that 


PR(Rn) < 5(€) and R(Ra) 2 R— ô(€). (4.37) 


In the remainder of the proof, we omit the conditioning on R, to simplify the notation. 
Fano’s inequality guarantees that (1/n)H(S|S) < 6(PR(R,)) < ê(€); therefore, 


R < R(Rn) + 4(€) 


1 
7 HS) — HA B”) + 8(€) 


< LS) 1 (S18) tHIA'B’) + 6(6) 
n n n 
= “1(S: $) - LEAB’) + 8(€). (4.38) 


Since S > X"A'B'Rx > Y"A"B’Ry — § forms a Markov chain, the data-processing 
inequality ensures that 


1(S;$) < I(X"RxA"B"; Y"RyA'B’). (4.39) 
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We further expand I(X”RxA"B’; Y” Ry A” B”) as 
W(X” RxA"B’; Y"RyA'B’) 
= I(X"Rx; Y" RyA” B”) + I(A” B"; Y"RyA"B’|X”Rx) 
= I(X"Rx; Y” Ry|A” B”) + I(X"Rx; A” B”) + I(A” B”; Y” RyA” B” |X”Rx) 
= I(X"Rx; Y” Ry|A” B”) + H(A” B”) — H(A” B” |X” Rx) 
+ I(A" B"; Y"RyA"B’|X”Rx). (4.40) 
Note that 
I(A” B”; Y” RyA”B”|X"Rx) = H(A” B” |X”Rx). (4.41) 
By applying Lemma 4.2 with S = X” Rx, T = Y"Ry, V” = A", and W” £ B”, we obtain 
I(X” Rx; Y” Ry|A”" B") < I(X”Rx; Y” Ry). 


Since the DMSs (Rx, PRx) and (Ry, Pry) are mutually independent and independent 
of the DMS (XYZ, pxyz), we have 


I(X”Rx; Y"Ry|A"B’) < nI(X; Y). (4.42) 
On combining (4.41) and (4.42) in (4.40), we have 
I(X”RxA" B"; Y” RyA” B”) < nI(X; Y) + H(A” B”), (4.43) 
and, using (4.43) in (4.39) and (4.38), we obtain 
R < I(X; Y) + ô(€). 


Since € can be chosen arbitrarily small and R can be chosen arbitrarily close to RS, it 
must hold that R™ < I(X; Y). 

We now show that all reconciliation rates below R®™ are achievable. This result follows 
directly from Corollary 2.4. In fact, for any € > 0, Corollary 2.4 guarantees the existence 
of a (2”*, n) code C, that compresses X” into a message A at rate R < H(X|Y) + 6(e€) 
and such that X” can be retrieved from Y” and A with probability of error P.(C,) < 4(€). 
Such a code can be viewed as a reconciliation protocol R, without sources of local 
randomness, for which S = X” and in which there is a single public message A of about 
nH(X|Y) bits exchanged over the public channel. The corresponding reconciliation rate 
is 


1 
R(R,) = lS) — H(A)) 
> lax) — R 
n 
= [(X; Y) — 6(€). 


Hence, all reconciliation rates R < I(X;Y) are achievable without sources of local 
randomness and with one-way communication. 
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The achievability proof of Proposition 4.5 carries over directly if the roles of Alice 
and Bob are interchanged. By having Alice estimate Y” and Bob send information over 
the public channel, Alice and Bob can recover a sequence of length nH(Y) while dis- 
closing about nHI(Y|X) bits. The rate of this reconciliation protocol is again on the order 
of I(X; Y). Reconciliation protocols for which S = X” are called direct reconciliation 
protocols, while those for which S = Y” are called reverse reconciliation protocols. Both 
direct and reverse reconciliation protocols can achieve the reconciliation capacity, but 
the keys that can be distilled subsequently might not be identical; this issue is discussed 
further in Section 4.3.3. 

Although Proposition 4.5 ensures the existence of reconciliation protocols achieving 
reconciliation rates arbitrarily close to I(X; Y), this limit cannot be exactly attained. 
Any practical finite-length reconciliation protocol introduces an overhead and dis- 
closes strictly more than nH(X|Y) bits over the public channel. It is convenient to 
account for this overhead by defining the efficiency of a reconciliation protocol as 
follows. 


Definition 4.15. The efficiency of a reconciliation protocol R, for a source model with 
DMS (XYZ, pxyz) is 


H(S) — r log(|Al 181) 
nI(X; Y) f 
The quantity r log(| A| |81) represents the number of bits required in order to describe 
all messages exchanged over the public channel. Note that 6 < 1 because 


ps (4.44) 


1 1 
z EG) -r log A 18)) < = GIS) — EAB") 


= R(Ra) 
< R™ 
= I(X;Y). 


In terms of efficiency, Proposition 4.5 states that there exist reconciliation protocols with 
efficiency arbitrarily close to one. 


Remark 4.6. With continuous correlated sources, lossless source coding with side 
information is not possible, since the discrepancies between continuous sources cannot 
be corrected exactly. Unlike traditional source coding problems, which can be analyzed 
in a rate-distortion framework, reconciliation requires Alice and Bob to agree on a 
common sequence for further processing. Consequently, the natural way of handling 
continuous sources is to quantize them to revert back to a discrete case. Assuming 
again that Alices randomness X” is chosen as the common sequence S, Alice can 
generate a quantized version Xj of X” with a scalar quantizer. The upper bound of 
Proposition 4.5 applies even if Y" is not quantized, and reconciliation rates can be no 
greater than I(Xa; Y). By Corollary 2.4, the upper bound can be approached with one-way 
communication only. Additionally, by choosing a fine enough quantizer, reconciliation 
rates can be made arbitrarily close to I(X;Y), and quantization incurs a negligible 
loss. 
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ALICE and BOB EVE 
So Sı S2 Šo 51 Š, 
function 
f 
y y y 
Ko Ky ? 


Figure 4.7 Principle of privacy amplification. Alice, Bob, and Eve apply a known function f to their 
respective sequences S and S. Because of the discrepancies between Š and S, the outputs K = f(S) 
and f (S) are different. Eve cannot predict how her errors propagate and, for a well-chosen f, she 
obtains no information about K. 


Privacy amplification 


Privacy amplification is the final step of a sequential key-distillation strategy that allows 
Alice and Bob to distill a secret key. Specifically, the role of privacy amplification is 
to process the sequence S obtained by Alice and Bob after reconciliation to extract a 
shorter sequence of k bits that is provably unknown to Eve. Without loss of generality, 
we assume throughout this section that S is a binary sequence of n bits. 

Before we analyze privacy amplification in detail, it is useful to develop an intu- 
itive understanding of why and how this operation is possible. First, note that privacy 
amplification is straightforward in certain cases. For instance, 


e if Alice and Bob know that Eve has no information about S, then the sequence S itself 
can be used as a key; 

e if Alice and Bob know that Eve has access to S, then no secret key can be distilled; 

e if Alice and Bob know that Eve has access to m bits of S then the remaining n — m 
bits of S can be used as a secret key. 


In general, privacy amplification is not so simple because Alice and Bob know a bound 
for Eve’s information that cannot be tied to bits of S directly. Nevertheless, we show 
that this bound is all Alice and Bob need to extract bits from S about which Eve has 
little knowledge. Why this operation is possible can be understood intuitively as follows. 
For simplicity, assume that Eve computes her best estimate S$ of S on the basis of her 
observations of the source and all messages exchanged over the public channel by the 
advantage-distillation and reconciliation protocols. Unless Eve’s information about S 
is exactly H(S), her estimate Š differs from S in some positions. The key idea is that 
Eve cannot determine the location of the discrepancies; consequently, as illustrated in 
Figure 4.7, if Alice and Bob apply a deterministic transformation f to their sequence S 
to shuffle and remove some bits, Eve cannot predict how her errors propagate and affect 
her outcome f (S). We will show that there exist transformations that propagate Eve’s 


4.3 Sequential key distillation 149 


errors so much that all possible outcomes of the transformation f become equally likely 
from Eve’s perspective. 

The precise analysis of privacy amplification is slightly involved because it does not 
rely on the Shannon entropy but on alternative measures called the collision entropy and 
the min-entropy. The detailed study of these entropies goes well beyond the scope of 
this book, and the following section presents only the properties that are useful in the 
context of secret-key distillation. We refer the interested reader to the bibliographical 
notes at the end of the chapter for further references. 


Collision entropy and min-entropy 

The collision entropy and the min-entropy are convenient metrics because they are 
tailored to the actual functions used for privacy amplification and because they are more 
sensitive than the Shannon entropy to deviations from uniform distributions. The latter 
property allows us to establish the achievability of strong secret-key rates rather than 
just weak secret-key rates. As an illustration, the following example exhibits a random 
variable that is not uniform but whose Shannon entropy is hardly distinguishable from 
that of a uniform random variable. 


Example 4.3. Consider a random variable K € L, 2*] with probability distribution 
1—2-*/4 
Qk 1 
In other words, K is approximately uniform because the realization K = 1 has probability 


274/4 while the others have probability ~ 27*. If k is large, this slight non-uniformity is 
not well captured by the Shannon entropy Hi(K). In fact, 


pak Tg 1 — 274/4 
HK) = 2*4 (2* — 1) Fi oe ( ) 


P[K=1]=2-** and P[K=i]= fori #1. 


2—1 
= a + (1 —2-**) log (2* — 1) — (1 — 2-*) log (1 — 2*4) 
7 k E 1—2-* 
=k+2 k/4 (5 -*) + (1-2 “tog (7) 


Therefore, lim, 50(1/)H(K) = 1, which obscures the fact that the random variable K 
is not exactly uniform. 


Definition 4.16. The collision entropy of a discrete random variable X € X is 


H(X) ê -log E[px(X)] = log (x: prt? 


xE 
For two discrete random variables X € X and Y € Y, the conditional collision entropy 
of X given Y is 


He(X|Y) £ X py(v) EXIY = y). 
yey 
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The collision entropy bears such a name because it is a function of the collision prob- 
ability X ey Px )’, which measures the probability of obtaining the same realization 
of a random variable twice in two independent experiments. 


Proposition 4.6. For any discrete random variable X € X, the collision entropy sat- 
isfies H(X) > H.(X) 2 0. If X is uniformly distributed over X, then H(X) = H,(X) = 
log|*|. 


Proof. Using Jensen’s inequality and the convexity of the function x œ> —logx, we 
obtain 


H(X) = Ex[—log px(X)] > —log Ex[px(X)] = H(X). 
In addition, since px(x) < 1 for all x € X, it holds that 
So px? < So px(x) = 1, 
xEX xEX 


and, therefore, H.(X) > 0. If X is uniformly distributed, then px(x) = 1/|4| for all 
x € X and we obtain H,(X) = log| X | by direct calculation. 


Remark 4.7. Many properties of the Shannon entropy H(X) do not hold for the collision 
entropy H,(X). For instance, conditioning may increase the collision entropy; that is, 
some random variables are such that H,(X|Y) > H,(X). In such cases, Y is called the 
spoiling knowledge. 


Definition 4.17. The min-entropy of a discrete random variable X € X is 


Hyo(X) = —log (max px) . 
xen 
For two discrete random variables X € X and Y € y, the conditional min-entropy of X 


given Y is 


Hoo(X|Y) £ X py(y)Hloo(XIY = y). 
yey 


Proposition 4.7. For any discrete random variable X € X, the min-entropy satisfies 
H.(X) > Ho(X) 2 0. If X is uniformly distributed over X then H(X) = H.(X) = 
H(X) = log|¥|. 


Proof. Since px(x) < 1 forall x € X, it is obvious that H,(X) > 0. Also, 


H.(X) = —log (x: prt? > —log (x: pats rx) 
xEX xEX i 


= —log (max px(x)) = Hx(X). 


If X is uniformly distributed, we obtain H(X) = log|*| by direct calculation. 


To illustrate that the collision entropy H(X) and the min-entropy H(X) are indeed 
stronger measures of uniformity that H(X), we revisit Example 4.3. 
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Example 4.4. Consider the random variable K € l, 2*] with distribution 
1—2-*4 
Qk] 
for which we showed that lim;_,.9(1/)H(K) = 1 in Example 4.3. The collision entropy 


of K is 
Lao 
H.(K) = —log 274/2 + (2 = 1) (=) 


k QF 4 OF 2% Qe 
log : 
2 2k—1 


P[K=1])=2-*4 and P[K=i]= fori # 1, 


Hence, limyz_,.(1/k)H.(K) = 5. 
For k large enough, the min-entropy of K is 


k 
H..(K) = —log2~*/4 = 7 


Therefore, limj—oo(1/k)Hloo(K) = 4. 


Remark 4.8. The collision entropy, the min-entropy, and the Shannon entropy are 
special cases of the Rényi entropy. For a discrete random variable X, the Rényi entropy 
of order a is 


1 
Ra(X) = — log (x: prc . 


xEX 
One can check directly that 
A(X) = R(X), Ho(X) = lim R(X), 
a—> oo 
and, using I’Hopital’s rule, 


HO) = lim Ri). 


Intuitively, the Shannon entropy, the collision entropy and the min-entropy of a random 
variable play the same role for discrete random variables as the arithmetic mean, 
geometric mean, and minimum value for a set of numbers. 


Privacy amplification with hash functions 

In this section, we introduce a generic privacy-amplification technique that exploits 
hash functions to distill a secret key. From the informal discussion at the beginning of 
Section 4.3.3, it should not be too surprising that hash functions play a role in privacy 
amplification. Hash functions are usually designed to produce significantly different 
outputs even when their inputs are quite similar, which is intuitively the sort of operation 
that we expect privacy amplification to perform. In what follows, we consider a specific 
class of hash functions called universal families. 


152 


Secret-key capacity 


Definition 4.18. Given two finite sets A and B, a family G of functions g : A —> B is 
2-universal (universal for short) if 


1 
Vx1,%.%EA xı Æ X: > Pe[G(x) = G(x2)] < 1B)’ 
where G is the random variable that represents the choice of a function g € G uniformly 
at random in G. 


Universal families of hash functions have been thoroughly studied and we provide 
two well-known examples of families without proving their universality. 


Example 4.5. By identifying {0, 1}” with GF(2)”, we associate with any binary matrix 
M e GF(2)**” a function 


hm : GF)” > GF(2)' : x> Mx. 
The family of hash functions Hı = {hm :M e GF(2)*"} is universal. 


Example 4.6. By identifying {0, 1}” with GF(2”), we associate with any element y € 
GF(2”) a function 


h, : GF(2") > {0, 1}* : x > k bits of the product xy. 


The & bits are fixed but their position can be chosen arbitrarily. The family of hash 
functions Hz = {h, : y € GF(2”)} is universal. 


Remark 4.9. Identifying a function in the family Hı requires nk bits, while iden- 
tifying a function in the family Hz requires only n bits. This difference does not 
affect the operation of privacy amplification, but, in practice, it is often desirable to 
limit the amount of communication; hence, the family Hı would be preferred to the 
family Hı. 


The usefulness of universal families of hash functions for privacy amplification is 
justified in the following theorem. 


Theorem 4.4 (Bennett et al.). Let S € {0, 1}" be the random variable that represents 
the common sequence shared by Alice and Bob, and let E be the random variable that 
represents the total knowledge about S available to Eve. Let e be a particular realization 
of E. If Alice and Bob know the conditional collision entropy H,(S|E = e) to be at 
least some constant c, and if they choose K = G(S) as their secret key, where G is a 
hash function chosen uniformly at random from a universal family of hash functions 
G : {0, 1}" > {0, 1}, then 


k-c 


H(K|G, E = e) >k-—. 
(KI e) P 
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Proof. Since H(K|G, E = e) > H.(K|G, E = e) by Proposition 4.6, it suffices to estab- 
lish a lower bound for the collision entropy H,(K|G, E = e). Note that 


H.(K|G, E = e) = X` pe(g)H.(KIG = g, E = e) 
geg 


= X pele) (—log Ex|G=e,E=e | pKice(Klg, e)] ) 
geg 


> —log | X` po(g)Exjc=¢,e=e[PKice(Klg.e)] | (4.45) 
geg 
where the last inequality follows from the convexity of the function x > —logx and 
Jensen’s inequality. Now, let Sı € {0, 1}” and S2 € {0, 1}” be two random variables that 
are independent of each other and independent of G, which are distributed according to 
Ps|E=e. Then, 


P[G(S\) = G(S2)IG =g]= J` pasycelklg, e)pasycelg. e) 
ke{0, 1}* 


= Ex c=e,t=e|Pkice(Klg, ©], 


and we can rewrite inequality (4.45) as 
H.(KIG, E = e) > —log P[G(S;) = G(S2)]. (4.46) 
We now develop an upper bound for P[G(S;) = G(S2)]. By the law of total probability, 
P[G(S1) = G(S2)] = P[G(S1) = G(S2), $1 = S2]P[S;1 = S2] 
+ P[G(S1) = G(S2), Sı # S2]P[S; F S2]. (4.47) 
Note that 
P[G(S,) = G(S2)|E = e, Sı = S2] <1 and PIS; # S2|E =e] <1. 
In addition, by virtue of the definition of the collision entropy, 


PIS = S2]= J, pse=e(sle} = 2™6E=9, 
se{0,1}” 


Finally, because the hash function G is chosen in a universal family, it holds that 
P[G(S1) = G(S2)|$1 # S2] < 2. 


On substituting these inequalities into (4.47), we obtain 


P[G(S,) = G(S2)] < 27 BeSIE=8) 4274 < 2-4 4 24%), (4.48) 
where the last inequality follows from the assumption H,(S|E = e) > c. On substitut- 
ing (4.48) into (4.46) and using the fact that In(1 + x) < x for all x > —1, we obtain 


gk-e 
In2° 


H,(K|G, E = e) > k — 
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Since H,(K|G, E = e) < k by definition, Theorem 4.4 states that it is possible to 
distill a secret key of k bits with hash functions, provided that their output size is small 
enough (k < c). The output sequence size depends directly on the a-priori uncertainty 
of the eavesdropper, which must be measured in terms of the collision entropy. Since 
this result is the essential tool that we use to analyze the achievable secret-key rates 
of sequential secret-key distillation strategies, it is worthwhile discussing it in detail. 
Note that, although the hash function G is chosen at random, the actual choice is known 
to Eve and this is reflected by the conditioning on G in the entropy; however, the 
theorem provides only a lower bound on H(K|G, E = e), which is an average over all 
possible choices of hash functions in G. Consequently, for a specific choice of g € G, 
the entropy H(K|G = g, E = e) might be significantly different from k, even if k « c; 
luckily, this happens with negligible probability. Most importantly, Theorem 4.4 provides 
an explicit privacy-amplification technique because it shows that it suffices to choose 
a hash function at random in a universal family. This is in sharp contrast with the 
random-coding argument used in Section 4.2.2, which guarantees only the existence 
of a suitable key-distillation function. Finally, we emphasize that Theorem 4.4 bounds 
the entropy H(K|G, E = e), which is a stronger result than a bound on the entropy rate 
(1/n)H(K|G, E = e). 

We now have all the tools to prove that sequential key-distillation strategies achieve 
strong secret-key rates. 


Theorem 4.5. Consider a source model with DMS (XYZ, pxyz) and let B € [0, 1]. 
All strong secret-key rates R; that satisfy 


R; < BI(CX; Y) — min(I(X; Z), ICY; Z)) 


are achievable with sequential secret-key distillation strategies that consist of a recon- 
ciliation protocol with efficiency B and privacy amplification with a universal family of 
hash functions. Additionally, these rates are achievable with one-way communication. 


Note that the secret-key rates given in Theorem 4.5 are achievable without advantage 
distillation. In addition, Theorem 4.5 shows that reconciliation efficiency acts as a penalty 
factor that reduces the information between Alice and Bob from I(X; Y) to BI(X; Z) 
but leaves the information leaked to the eavesdropper I(X; Z) or I(Y; Z) unchanged. 
This result is particularly relevant because any practical reconciliation protocol has an 
efficiency 6 < 1, which turns out to be one of the main limiting factors of secret-key 
rates. Although Proposition 4.5 guarantees the existence of reconciliation protocols 
with £ arbitrarily close to unity, the design of efficient protocols can be challenging. 
The construction of practical yet efficient reconciliation protocols is discussed in greater 
detail in Chapter 6. 

The proof of Theorem 4.5 is involved because we need to establish a lower bound 
for Eve’s collision entropy about Alice and Bob’s common sequence S before we can 
apply Theorem 4.4. There is no obvious bound since Eve’s total knowledge consists not 
only of the observations of the source Z” but also of the public messages A” and B” 
exchanged during the reconciliation phase. 
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Proof of Theorem 4.5. Fix an integer n and lete > 0. Let k be an integer to be determined 
later. We consider a sequential key-distillation strategy S,, that consists of 


e adirect reconciliation protocol R, with efficiency 8, with which Alice sends messages 
A’ to Bob; Theorem 4.5 guarantees the existence of a protocol so that Bob’s estimate 
X” of X” satisfies P[X” Æ X” |Rn] < b(n); 

e privacy amplification based on a universal family of hash functions with output size 
k, at the end of which Alice computes her key K = G(X”), while Bob computes 
K = G(X"). 


Note that P[K #K|S,] < P[X" Æ X” [Rn] < ôe(n) and that the strategy uses only one- 
way communication from Alice to Bob. The total information available to Eve after 
reconciliation consists of her observation Z” of the DMS (1 YZ, pxyz), the public 
messages A” exchanged during the reconciliation protocol, and the hash function G 
chosen for privacy amplification. The strategy S, is also known to Eve, but we omit the 
conditioning on S, in order to simplify the notation. We show that, for a suitable choice 
of the output size k, 


k > H(K|Z"A’G) > k —8,(n). 


This result will follow from Theorem 4.4, provided that we establish a lower bound 
for the collision entropy H.(X” |Z” = z”, A” = a”). This is not straightforward because 
the collision entropy depends on the specific operation of the reconciliation protocol; 
nevertheless, we circumvent this difficulty in two steps. 

First, we will relate H,(X"|Z" = z”, A" =a") to H,(X"|Z” = z”) by means of the 
following lemma, whose proof is relegated to the appendix at the end of this chapter. 


Lemma 4.3 (Cachin). Let S € S and U € U be two discrete random variables with 
Joint distribution psu. For any r > 0, define the function x : U — {0, 1} as 


a) & {1 if HS) ~He(Slu) < logi] + 2r + 2, 
XO 10 otherwise. 


Then, Py[x(U) = 1] > 1 = 27. 


Lemma 4.3 shows that, with high probability, the decrease in collision entropy caused 
by conditioning on U is bounded by log|//| + 2r + 2, which does not depend on the 
exact correlation between S and U. 

Next, we will lower bound H,(X"|Z” = z”) by a term on the order of nH(X|Z). 
Essentially, this result follows from the fact that, for n large enough, the realizations 
of the random source (X”, Z”) are typical and almost uniformly distributed in the joint 
typical set. This idea is formalized in the following lemma, whose proof is again relegated 
to the appendix. 


Lemma 4.4. Consider a DMS (XZ, pxz) and define the random variable © as 


o [1 if OZ) eTUKZ) and Z" € 7X2), 
~ \0 otherwise. 
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Then, for n sufficiently large, P[O = 1] > 1 — 2-¥". Moreover, if z" € T2 (2), 
HOZ = z",O = 1) > Hy (X"|Z" = 2",O=1) 
> n(H(X|Z) — d(€)) — d<(n). 


To leverage the results of Lemma 4.3 and Lemma 4.4, we start by defining the random 
variables Y (a function of A”) and © (a function of X” and Z”) as follows: 


ya fi if HZ" = 2") = H(X" = 2", A’) < log Al" + 2/7 +2, 
~ 10 otherwise; 


ga fl if (2) TUXZ) and Z" € T(X2), 
~ 0 otherwise. 


Lemma 4.3 guarantees that P[Y = 1] > 1 — 27V” and Lemma 4.4 ensures that 
P[O = 1] > 1 — 277”; therefore, by the union bound, 


P[O =1,Y =1]) 21-2-2-~". 
Consequently, we can lower bound H(K|GZ”A") as 
H(K|GZ”"A") > H(K|GZ"A' OY) 
> P[O = 1, Y = 1JH(K|GZ"A’,O = 1, Y = 1) 
> (1 =e a) H(K|IGZ"A",@ =1,Y = 1). (4.49) 


To bound H(K|GZ"A", © = 1, Y = 1) with Theorem 4.4 it suffices to lower bound the 
collision entropy 


H,(X"|G, Z” =z", A" =a",O=1,Y = 1) 


for any realization z” € 7# (Z) and a”. By virtue of the definition of Y, 
H(X” |Z" = z”, A” =a’,O=1,Y=1) 
> H,(X"|Z" = z", © = 1) — log A|" —2,/n — 2. (4.50) 


The quantity log A|” represents the number of bits required to describe the messages 
exchanged during the reconciliation phase, which we can express in terms of the recon- 
ciliation efficiency § as 


logi A|” = H(X”) — nBI(X; Y). 
Therefore, we can rewrite (4.50) as 
H,(X"|Z" =2",A’ =a’,0=1,Y=1) 
> H,(X"|Z" = z”, © = 1) — H(X”) + nBI(X; Y) —2./n — 2. (4.51) 


Next, notice that Lemma 4.4 ensures 


H(X” |Z" = z", © = 1) > n(H(X|Z) — 8(€)) — 8,(n). (4.52) 
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Combining (4.51) and (4.52) yields 
H,(X"|Z" =z", A’ =a’,O=1,Y=1) 
> n(HI(X|Z) — 6(€)) — H(X”) + nBL(X; Y) — 2Vn — 2 — 8.(n) 
= n(BI(X; Y) — 1(X; Z) — 5(€)) — 2\/n — 2 — ôe (n). (4.53) 


Hence, if we set the output size of the hash function & to be less than the lower bound 
in (4.53) by ./n, say 
k = |n(BI(X; Y) — 1X; Z) — 6(€)) — 3./n — 2 — ôe(n)], (4.54) 


then Theorem 4.4 ensures that 


H(K|GZ” = z", A” =a’,O=1,V=1)>k- — 
= k — (n). (4.55) 
On substituting (4.55) back into (4.49), we finally obtain 
H(K|GZ"A") > (1—2. 277") (k — 6(n)) = k — d(n), (4.56) 


and, consequently, 
I(K; GZ" A") = H(K) — H(K|GZ"A") < ô(n). 


Notice that the corresponding secret-key rate is 


k 
R £ — = p(X; Y) — I(X; Z) — 6(€) — ô(n). (4.57) 
n 
Hence, we have proved the existence of a (2”*, n) sequential key-distillation strategy 
S, with rate given by (4.57) and based on a direct reconciliation protocol of efficiency 
B and privacy amplification with a universal family of hash functions such that 


P(S) < e(n), L(S,) < ôe(n), and U(S,) < ôe(n). 


Hence, BI(X; Y) — I(X; Z) — ô(€) is an achievable strong secret-key rate. Since € can 
be chosen arbitrarily small, all strong secret-key rates below B1(X; Y) — I(X; Z) are 
achievable, as well. 

The achievability of all strong secret-key rates below SI(X; Y) — I(Y; Z) follows from 
the same arguments on reversing the roles of Alice and Bob and considering a reverse 
reconciliation protocol. E 


Although both direct and reverse reconciliation protocols can operate close to the 
reconciliation capacity RS, the secret-key rates obtained after privacy amplification are 
different. In the proof of Theorem 4.5, the secret-key rates below BI(X; Y) — I(X; Z) 
are achievable with a direct reconciliation protocol, whereas the secret-key rates below 
BI(X; Y) — I(Y; Z) are achievable with a reverse reconciliation protocol. Intuitively, a 
direct reconciliation protocol uses Alice’s observations as the reference to distill the key 
and the information leaked to Eve is therefore I(X; Z). In contrast, a reverse reconciliation 
uses Bob’s observations as the reference and the information leaked to Eve is then I(Y; Z). 
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Remark 4.10. Privacy amplification with a universal family of hash functions is a 
special case of privacy amplification with an almost universal family of hash functions. 
For y > 1, a y-almost universal family G of hash functions g.A — B is such that 


Y 
Yx x2 EA xı Æx > Pe[G(x1) = G(x2)] < Tae 
where G is the random variable that represents the choice of a function g uniformly at 
random in G. It is possible to show that sequential key-distillation strategies based on y- 
almost universal families of hash functions can achieve all strong secret-key rates below 


PI(X; Y) — min(I(X; Z), ICY; Z)) — log y, 


which is lower than the rate given in Theorem 4.5 if y > 1. Nevertheless, almost- 
universal families of hash functions might be preferred to universal families of hash 
functions in practice because of their greater flexibility and lower complexity. A more 
detailed discussion of privacy amplification based on almost-universal families of hash 
functions can be found in the textbook of Van Assche [38]. 


Corollary 4.2. The strong secret-key capacity C™ of a source model with DMS 
(XYZ, pxyz) is equal to its weak secret-key capacity C$“. In addition, all strong 
secret-key rates below CS are achievable with sequential key-distillation strategies. 


Proof. This result follows directly from Proposition 4.2 and Theorem 4.5. According 
to Proposition 4.2, an advantage-distillation protocol D, can be used to transform n 
realizations of the original DMS (1 YZ, pxyz) into a single realization of anew DMS 
(X'Y'Z', px-y'z') for which 


1(X’; Y’) — 1(X’; Z’) >0 or I(X;Y’) — I(Y’; Z’) > 0. 
For this new source and any € > 0, Theorem 4.5 ensures that the strong secret-key rate 
R! = max(I(X'; Y’) — 1(X’;Z’), 1(X’; Y’) — I(Y’; Z')) — ô(€) 


is achievable, where R/ is expressed in bits per observation of the DMS 
(X'Y'Z', pxy'z'). The rate in bits per observation of the original DMS (1 YZ, pxyz) 
is then 


R, = 1 max(1(X; Y’) — 1032), 1(X's¥) - I(Y;Z')) ~ 8 
= R(D,) = ô(€), 


which can be made as close as desired to the advantage-distillation capacity D™. By 
Proposition 4.2, D™ = C™ and, because R, is a strong secret-key rate, we obtain 
C™ > C™ and hence C™ = C™, 


Theorem 4.5 and Corollary 4.2 have far-reaching implications. First, the fact that the 
strong secret-key capacity CS is equal to the weak secret-key capacity C™ is reassuring 
because it suggests that the fundamental limit of secret-key generation is fairly robust 
with respect to the actual choice of secrecy condition. Second, since sequential-key 
distillation strategies can achieve all rates below C$", we know that there is no loss of 
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optimality in handling the reliability and secrecy requirements independently. Whether 
sequential key-distillation strategies can achieve secret-key rates close to CS“ depends 
on our ability to design advantage-distillation protocols, but at least we have an explicit 
procedure for privacy amplification, and we will see in Chapter 6 how to design efficient 
reconciliation protocols. Finally, the proof of Corollary 4.2 highlights the importance of 
two-way communications for secret-key agreement as a means to distill an advantage 
over the eavesdropper. Two-way communication is required for advantage distillation, 
although reconciliation and privacy amplification can be implemented with one-way 
communication only. 


Privacy amplification with extractors 

We conclude our study of privacy amplification by discussing an alternative to universal 
families of hash functions and by analyzing privacy amplification with a class of functions 
called extractors. In essence, the analysis and the results are identical to those obtained 
earlier, but they exploit the min-entropy in place of the collision entropy. The only 
difference between privacy amplification with hash functions and privacy amplification 
with extractors is the amount of communication over the public channel. 


Definition 4.19. A function g : {0, 1}" x {0, 1}4 — {0, 1¥ is called a (y, €)-extractor 
if, for any random variable S € {0, 1}" with min-entropy Ho(S) > yn and a random 
variable Ug uniformly distributed over {0, 1}“, the variational distance between the 
random variable (Ug, g(S, Ua)) € {0, 1}¢+* and the random variable Ug. with uniform 
y+ satisfies 


V((Ua, g(S, Ua)), Use) < €. 


In other words, an extractor is a function that converts a sequence of n bits with 
arbitrary distribution into a sequence of k bits with almost uniform distribution using 
d bits of randomness as a catalyst. If d < k, that is the extractor outputs more uniform 
randomness than is used at the input, this operation can be thought of as a way of 
“extracting” uniform randomness from the random variable S. In practice, an extractor is 
useful ifd « k, which means that little extra randomness is necessary for the extraction. 
The existence of such extractors is guaranteed by the following proposition, which we 
state without proof. 


distribution over {0, 1 


Proposition 4.8 (Vadhan). For any € > 0, y € (0, 1), there exists a (y, €)-extractor 
g : {0,1}" x {0, 1} — {0, 1} with 


k = yn — 2log (<) — O(1) 


2 
d=O [we (*)) log(yn) 


In other words, for n large enough, there exist extractors that extract almost the entire 
min-entropy of the input S and require a comparatively negligible amount of uniform 


and 
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randomness. The existence of such extractors allows us to prove the counterpart of 
Theorem 4.4 with extractors in place of hash functions. 


Theorem 4.6 (Maurer and Wolf). LetS € {0, 1}" be the random variable that represents 
the common sequence shared by Alice and Bob, and let E be the random variable 
that represents the total knowledge about S available to Eve. Let e be a particular 
realization of E. If Alice and Bob know the conditional min-entropy H.(S|E = e) to be at 
least yn for some y € (0, 1), then there exists a function g : {0, 1}" x {0, 1}4 > {0, 1}* 
with 
d<néd(n) and k>n(y — ô(n)), 

such that, if Ug is a random variable with uniform distribution on {0, 1}4 and Alice and 
Bob choose K = g(S, U4) as their secret key, then 


A(K|Ug, E = e) > k — ô(n). 
Proof. Let e 4 2-vi/log” and let Ke € {0, 1}* be the random variable with distribution 


A 
PK. = Pg(S,Ua)|E=e- 


According to Proposition 4.8, there exists a (y, €) extractor g : {0, 1}” x {0, 1}4 > 


{0, 1} with 
: 2 
d=O log . log(yn) | = nd(n), 


k= yn — 2log (=) — O(1) = nly — èn), 


and such that V((Uyg, Ke), Ua+4) < 2-Vn/log” Note that we can write the uniform dis- 
tribution pu,,, over {0, 1}¢t* as the product of the uniform distribution pu, over {0, 1}¢ 
with the uniform distribution pu, over {0, 1}*; hence, 


V((Ua, Ke), Unse) = XC [Pu PK. (slu) — Pu, @Pu,(s)| 


u,s 


= Ñ pu,(u)|px,ju,(sle) — 2*| 


u,s 


= Bu, £ | PKU (s lUa) — z i 


By Markov’s inequality, 


Pu, |X Preto" | Seer" 


Ss 


Sus [Z [Prau Ua) — 2] 
< 
2-vn/(2 logn) 
a2 egn, (4.58) 


In other words, with high probability, the realization ua of Ug is such that the variational 
distance between px,ju, and a uniform distribution over {0, 1}* is small. Formally, if we 
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define the random variable © function of Ug as 


1 if V(Uz, Ke), Ugye) < 27¥"/2 8), 
O= ; 
0 otherwise. 


then, by (4.58), 
P[O = 1] > 1 — 2778n, (4.59) 
We now lower bound H(K|U,, E = e) as 
H(K|Ug, E = e) > H(K|Ug, O, E = e) 
> P[O = 1JH(K,|Uy, © = 1). (4.60) 


Given © = 1, the variational distance V(Uy, Ke; Ug+;) is less than 2-vr/log and note 
that the function x > x log(2*/x) is increasing for x € (0, 2*—'); therefore, for n large 
enough, we have, according to Proposition 2.1, 


2k 
|H(K,|Ug, © = 1) — k| < 2-¥"/""8" log (am) 


2-Vn/logn 
= g-vn/logn k+ Jn 
logn 
= ô(n). (4.61) 


On combining (4.59) and (4.61) in (4.60), we finally obtain 
H(K|U4, E = e) > k — ô(n), 


which is the desired result. 


We can now establish the counterpart of Theorem 4.5. 


Theorem 4.7. Consider a source model with DMS (XYZ, pxyz) and let B € [0, 1]. 
All strong secret-key rates R; that satisfy 


R, < BI(X; Y) — min(I(X; Z), I(Y; Z)) 


are achievable with sequential secret-key distillation strategies that consist of a reconcil- 
iation protocol with efficiency B and privacy amplification with extractors. Additionally, 
these rates are achievable with one-way communication. 


Sketch of proof. The proof of Theorem 4.7 follows exactly that of Theorem 4.5. The 
only difference is the use of extractors instead of hash functions, which requires the 
use of the min-entropy instead of the collision entropy. Notice that Lemma 4.4 already 
establishes a lower bound for the min-entropy, hence we need just the counterpart of 
Lemma 4.3 for the min-entropy. As shown in the appendix to this chapter, the following 
result holds. 


Lemma 4.5. Let S € S and U € U be two discrete random variables with joint distri- 
bution psu. For anyr > 0, define the function x : U — {0, 1} as 
a Jl ifHo(S) — Hoo(S|u) < loglU| +r, 
x(u) = i 
0 otherwise. 


Then, Pulx(U) = 1] > 1 = 27. 
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Essentially, Lemma 4.5 states that, with high probability, conditioning on U reduces 
the min-entropy by at most on the order of log|?/|. This allows us to relate the min-entropy 
before and after reconciliation independently of the exact operation of the reconciliation 
protocol. 


Secret-key capacity of the channel model 


In this section, we turn our attention to the channel model for secret-key agreement 
and study the secret-key capacity of a channel model. The secret-key capacity of a 
channel model is sometimes called the secrecy capacity with public discussion, because 
a channel model can be viewed as a WTC enhanced by a public channel between Alice 
and Bob. However, this denomination is slightly misleading because, in contrast to the 
secrecy capacity, the secrecy capacity with public discussion characterizes a secret-key 
rate, not a secure message rate. In this book, we restrict ourselves to the term secret-key 
capacity and it will be clear from the context whether this refers to a source model or a 
channel model. 


Definition 4.20. The weak secret-key capacity of a channel model with DMC 
(X, pyzx, Y, Z) is 


C™ £ sup{R : R is an achievable weak secret-key rate}. 
Similarly, the strong secret-key capacity is 
C™ £ sup{R : R is an achievable strong secret-key rate}. 


It follows from the definition of achievable rates that C™ < C™. We prove in 
Section 4.5 that C™ = C™, but until then we focus on the weak secret-key capac- 
ity. Note that the definition of key-distillation strategies for the channel model is 
so broad that an exact characterization of the secret-key capacity for a general 
channel model seems out of reach. Nevertheless, it is possible to develop upper 
and lower bounds similar to those developed in Theorem 4.1 for the secret-key 
capacity. 


Theorem 4.8 (Ahlswede and Csiszár). The secret-key capacity C&™ of a channel model 
satisfies 


max (maa Y) — I(X; Z)), max(I(X; Y) — ICY; 2)) 
Px PX 
< C™ < max min (I(X; Y), I(X; Y|Z)). 
Px 


Proof. We derive the lower bound by considering a specific key-distillation strategy, in 
which the the input X” is chosen i.i.d. according to an arbitrary distribution px. For this 
choice of input, the channel model becomes a source model with DMS (XYZ, pxyz) 
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whose secret-key capacity is C™(pxyz). Therefore, 


Cy" > max C$" (pxyz) 
PX 
> max (max (1(X; Y) — 1(X; Z)), max (1(X; Y) — I(Y; z) , 
Px Px 


where the second inequality follows from Theorem 4.1. 

We establish the upper bound with a converse argument similar to that used in the proof 
of Theorem 4.1; the proof is more technical because the inputs of the channel depend on 
previously exchanged messages and past inputs. Let R be an achievable secret-key rate 
and let € > 0. For n sufficiently large, there exists a (2”*, n) key-distillation strategy S,, 
such that 


P.(Sy) < 5(€), TUS) < 56) and “USs) <0; (4.62) 


In the following, we omit conditioning on S, in order to simplify the notation. Fano’s 
inequality combined with (4.62) ensures that 


1 EEEN 1 ; 
-H(K|KA”B" Z") < -H(KIK) < 5(Pa(Sn)) < 8(€). 
n 


We also introduce the random variables A; for j € [0,n], which represent all public 
messages exchanged after Y; and Z; have been received but before Y;,, and Zj+, are 
received: 


Ao © (At «++; Anat, Breis Baa); 
A = (Arto e Aye ty Byn oee Byna) for j © [1a]. 


With this definition, note that A” B” = AoA”. Following the steps used to prove (4.20) 
for the source model, we can show that 


1 
R < —I(X"Rx; Y"Ry|A"B" Z”) + 8(€) 
n 
1 
= IX" Ry Y"Ry JAA" Z”) + 8(€). (4.63) 
n 


However, in contrast to the case of the source model, X; = h ;(BY7!, Rx) is a function 
of B—! and Rx; therefore, the inequality simplifies to 


R< TIRy; Y'RYIAA" Z") + 8(€). 
Next, we expand I(Rx; Y’Ry|AoA”Z”) as 
I(Rx; Y"Ry|AoA"2") 
= I(Rx; Y"RyZ" AoA") — I(Rx; Z” AoA") 
= I(Rx; RyAo) — I(Rx; Ao) + Rx; Y"Z"A"|RyAo) — I(Rx; Z"A"|Ao), 
= I(Rx; Ry|Ao) + (Rx; Y"Z"A"|RyAo) — I(Rx; Z"A"|Ao) (4.64) 


and study each term separately. 
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Since Ag = A"'~'B'—!, we can apply Lemma 4.2 to I(Rx;Ry|Ao) with S = Rx, 
T2Ry,U£0,r Si, — 1, V; = Aj, and W; £ B; to obtain 


I(Rx; Ry|Ao) < I(Rx; Ry) = 0. (4.65) 


Next, notice that 


I(Rx; Y"Z"A"|RyAo) = X I(Rx; Z; YjA;IRyAoZI YIA!) 


j=l 


n 


j=l 


(I(Rx; Z;¥;|RyAoZ/~'Y/~! AJ") 


+1(Rx;Aj|RyAZ/Y/A/')), (4.66) 


and, similarly, 
I(Rx;Z"A"|Ao) = X (I(Rx; Zj|Ao 


j=l 


ZILAS!) + I(Rx; AJAZ A !)). (4.67) 


On substituting (4.65), (4.66), and (4.67) into (4.64), we obtain 


I(Rx; y” Ry AA” Z") 


n 


< XC (I(Rx; ZY; [RAZ YIA!) — 1(Rx3 Zj|AoZ7-1A“1)) 


j=l 


+ X (I(Rx; A;IRyAoZIY AJ!) — I(Rx; Aj|[AoZ/A™!)) . (4.68) 


We proceed to bound the terms in each sum separately. First, 


I(Rx;Z;¥;|RyAoZ/7!Y/-1 AI") 


— I(Rx; Z; AZA!) 


= H (ZY; RyAZi YIA!) — H(Z;¥;|RxRyAoZ/1 YT AI!) 


— H(Z; |Z A!) + 


H(Z;IRxAoZ tA !). (4.69) 


Since conditioning does not increase entropy, we have 


H(Z;Y;|[RyAoZ/~!Y/1 A“) — 
< H(Z; 
= H(Y; 
< H(Y; 


H(Z;|AoZ/~' A‘) 


YAZA!) — HI(Zj|AoZ/7 AS) 
Z;\Z A!) 
Z;). (4.70) 


In addition, since RxRyAoZ/~!Y/—-!A/~! —> X; + Y;Z; forms a Markov chain and 


X; = h;(B'—!, Rx), we have 


H(Z;Y;|RxRyAoZ/~ 


and 


Pye" ieee TEN ZV 53) (4.71) 


H(Z;|RxAoZ/“'A/“') = H(Z;|X;) . (4.72) 
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On substituting (4.70), (4.71), and (4.72) into (4.69), we obtain 
I(Rx; Z;¥;|RyAoZ!~!Y/~1 AI“) — I(Rx; Zj;|AoZ7 1 AZ!) 
H(Y;|Z;) — H(Z;¥,IX,) + H(Z;1X;) (4.73) 
= 1(X;; Y,IZ;). 
We now turn our attention to the terms in the second sum of (4.68): 
I(Rx; A;|RyAoZ/¥/A~!) — I(Rx; Aj|AoZ/ Al) 
= (Rx; A;Ry Y} JAZ A7!) — I(Rx; RyY/ |AoZ/ AI!) — I(Rx; Aj|AoZ/ Al") 
= I(Rx;Aj|AoZ/ AJ!) + (Rx; RyY/|AgZ/ A!) — I(Rx; RyY AZ A‘) 
—1(Rx;A;|AoZ/ AJ") 
= I(Rx; RyY/|AoZ/A/) — I(Rx; RyY [AZ A7). 


By applying again Lemma 4.2 with S £ Rx, T £ RyY’, U £ AZ’, r £ j, V; £ Ai, 
and W; £ B;, we obtain 


I(Rx; RYY;|A0Z A7) < I(Rx; RyY;1A0Z/ A7!) 
and, therefore, 
I(Rx; AjIRyAoZ/Y/ AI!) — I(Rx; Aj|[AoZIAI) < 0. (4.74) 
On substituting (4.73) and (4.74) into ce 68), we finally obtain 


I(Rx; Y"Ry|A"Z") <b XV iz) < n max I(X; YZ). (4.75) 


Hence, 


<u isY;IZ;) + 6(€). 


It remains to show that R < X71 I(X;; Yj) + 6(€). Since 
1 ps > 
LH(KIka"B") < +H(KIR) < 80), 
n n 


it suffices to reiterate all the steps leading to (4.75) without the conditioning on Z” to 
obtain the desired result. 


Even though Theorem 4.8 does not characterize the secret-key capacity exactly, it 
provides simple bounds that do not depend on auxiliary random variables. 


Example 4.7. For some channel models, the secret-key capacity is even equal to the 
secrecy capacity. The simplest example of such a situation is a channel model in which 
the DMC (¥, pyzx, Y, Z) is physically degraded. In this case X —> Y —> Z forms a 
Markov chain and 


C™ = max I(X; Y|Z) = C 
Px 
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In other words, using a wiretap code is an optimal key-distillation strategy for a channel 
model in which the eavesdropper’s channel is physically degraded with respect to the 
main channel. 


Remark 4.11. The converse technique used in the proof of Theorem 4.8 can also be 
used to derive simple bounds for secure rates over WTCs; in fact, the secret-key capacity 
is always an upper bound for the secrecy capacity since a wiretap code is a special 
case of key-distillation strategy for a channel model. As an application, we revisit the 
WTC with confidential rate-limited feedback, for which we computed achievable rates in 
Section 3.6.2. Following the steps used to establish (4.20), we can show that the secret- 
key capacity with public discussion of the WTC with confidential rate-limited feedback 
satisfies 


1 1 
—H(K) < —I(RxF"; RyY"|Z”A'B’) + (€). 
n n 


The only difference from (4.63) is the presence of the term F” representing the messages 
sent over the confidential feedback channel. Nevertheless, we have 


1 1 1 
~I(RxF"; RyY"|Z"A"B") = —I(F"; RyY”|A" BZ”) + —I(Rx; RyY"|A"B’ F" Z”) 
n n n 
1 1 a 
< —H(F") + —I(Rx; RyY"|A’ B’F"Z"). 
n n 


The second term on the right-hand side is similar to (4.63), since the messages sent 
over the secure feedback channel appear along A” and B” and can now be interpreted 
as public messages. Following the same steps and using the fact that (1/n)H(F") < Re, 
we obtain the upper bound 


1 n 
—H(K) < n Y,|Z;). 
a (K) < Rr + LUX YZ) 
j=l 
This outer bound coincides with the achievable rate obtained in Proposition 3.9 if the 
channel is physically degraded. 


Strong secrecy from weak secrecy 


In this section, we develop a generic mathematical procedure by which to construct 
a scheme (wiretap code or key-distillation strategy) that guarantees a strong secrecy 
condition from a scheme that guarantees a weak secrecy condition. We show that this 
procedure entails no rate loss, which allows us to prove that C™ = C% for a channel 
model and C, = C, for a wiretap channel and justifies a posteriori the use of a weak 
secrecy condition in previous chapters. 


Proposition 4.9. The strong secret-key capacity C of a channel model with DMC 
(X, pyzx, Y, Z) is equal to its weak secret-key capacity C9. 
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Figure 4.8 From weak to strong secrecy. 


Proof. Consider a channel model with DMC (4, pyz)x, Y, Z) and a (2"%,n) key- 
distillation strategy S,, that achieves weak secret-key rates. As illustrated in Figure 4.8, 
we construct a new strategy by 


e using the key-distillation strategy S, m times to generate m weakly secure keys; 

e treating the weakly secure keys as the realizations of a DMS and distilling strong 
secret keys by means of information reconciliation and privacy amplification with 
extractors. 


Note that the post-processing of the weakly secure keys is possible because keys are not 
meant to carry any information by themselves and do not need to be known ahead of 
time. 

Formally, let € > 0. By definition, there exists a (2”*, n) key-distillation strategy S,, 
with rate R > C™ — e such that 


P(S) < d(€) and “L(S,) < 80. 


Alice runs S, m times to generate m independent keys. In each run ¿į € |1, m], Alice 
obtains a key K;, Bob obtains a key Ki, and Eve obtains the observations Z? together 
with public messages A} and B}. Effectively, the situation is as if Alice, Bob, and Eve 
observed m realizations of a DMS (4) 2’, px-y'z’) with 


X SK YK, and ZEAZ AB. 
According to Theorem 4.7, Alice and Bob can distill a strong secret key K of length 
k= m(I(X'; Y') — 1(X'5Z’) — 6(€)) 
and such that 
H(KIZ™”S,) > k— ôe(m) 


by means of a one-way direct reconciliation protocol and privacy amplification with 
extractors. Note that 


I(X;Y") — I(X'; Z’) = I(K; K|S,) — I(K; Z" A” B" |S,) 
= H(K) — H(K|KS,) — I(K; Z” A” B” |S,) 
nR — n6(P.(S,)) — nL(S,) 


> 
> nC™ — né(e). 
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Therefore, the rate (in bits per use of the channel (4, pyz)x, Y, Z)) at which the strong 
secret-key K is generated is 


k 
— > ale), 
mn 


Since € can be chosen arbitrarily small, we conclude that C™ > C™ and, therefore, 
co = co 
S g * 


Proposition 4.10. The strong secrecy capacity or ofa WTC (&, pyzx, Y, Z) is equal 
to its weak secrecy capacity CY". 


Proof. The proof is similar to that of Proposition 4.9 but the objective is now to transmit 
messages (instead of generating keys) without relying on a public channel. Let € > 0. 
By definition, there exists a (2”*, n) code C,, with rate R > CS" — e such that 


PA(C,) < 6(€) and TLC) < ô(€). 


Alice uses the code C, m times to transmit m independent messages. In each run 
i € [1, m], Alice transmits a message M;, Bob obtains a message M;, and Eve obtains 
the observations Z7. Again, the situation is as if Alice, Bob, and Eve observed m 
realizations of a DMS (4’")' 2", pxy'z') with 


XM, VSM, ad ZZ", 
According to Theorem 4.7, Alice and Bob can distill a strong secret key K of length 
k= m((X'; Y’) — I(X'; Z’) — 6(€)) 
and such that 
H(K|Z’"C,) > k — 5<(m) 


by means of a one-way direct reconciliation protocol and privacy amplification with 
extractors. The reader can check that 


1(X’; Y’) — I(X'; Z’) > nC" — nd(e). 


However, because there is no public channel, the messages required in order to per- 
form reconciliation and privacy amplification must be transmitted over the channel; 
we must also account for these additional channel uses in the calculation of the final 
key rate. 

By Proposition 4.5, there exists a one-way reconciliation protocol that exchanges 
m(H(X'|Y’) + 6(€)) bits of public messages and that guarantees P[x’ x x ] < êe(m). 
By Theorem 4.6, there exists an extractor that requires the transmission of mô(m) bits 
of uniform randomness in order to distill the key K. Alice can transmit these bits to 
Bob over the main channel (æ , PYIX; y) with an error-correcting code of length m. Let 
Cm denote the capacity of the channel (x , PYIX; y). From Shannon’s channel coding 
theorem, we know that there exists a code of rate Cm — 4(€) that guarantees an average 
probability of error less than ôe(m). Therefore, the transmission of these additional bits 


4.6 


4.6 Conclusions and lessons learned 169 


requires 


es — d(€)) + ram] 
a 6.5) 


channel uses. By virtue of Fano’s inequality, 
H(X'IY') = H(M|M) < Hy (PCy) + P(C, )nC® = n8.(n). 


All in all, the strong secret key K can be generated at a rate 


k x m(nCs" — né(e)) 
mn +n! ~ Pa m(nô(n) + 5(€)) + mê(m) +1 
Cm — ô(€) 


= C™ — ô(€) — 6. (n, m). 


In other words, for n and m large enough, the transmission of reconciliation and privacy- 
amplification messages over the channel incurs a negligible rate penalty. 

To conclude, it remains to show that the key K can be interpreted as a message so that 
the key rate is a message rate. Notice that all communications over the channel are one- 
way; therefore, in principle, Alice could choose the final key K ahead of time, “invert” 
the privacy-amplification process, and artificially split the transmission over mn channel 
uses. Hence, the final strong secret-key K can be treated as a message M that satisfies 
a strong secrecy condition. Since € can be arbitrarily small, we obtain CY" > C™' and 
thus CW" = CW". 


Remark 4.12. Privacy amplification with extractors is critical to obtaining strong 
secrecy with a negligible rate penalty. It is possible to show that the minimum size 
of a universal family of hash functions G : {0, 1}"" — {0, 1}* is 2”"-*; therefore, the 
minimum number of bits to describe a randomly chosen hash function in the family is 
mn — k and the number of channel uses required to transmit this choice would be 


mn—k 
Cm — 6(€)’ 


which incurs a non-negligible rate penalty. 


Remark 4.13. The mathematical procedure used to convert weakly secure codes in 
strongly secure codes is not really practical. Although the “inversion” of the process is 
conceptually feasible, it becomes quickly intractable as n and m become large. Hence, 
this procedure does not replace the construction of wiretap codes, which we discuss in 


Chapter 6. 


Conclusions and lessons learned 


The results obtained in this chapter allow us to draw several crucial conclusions. First 
and foremost, the analysis of the secret-key capacity for source and channel models 
shows that feedback improves secrecy. We already knew from Section 3.6.2 that secure 
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feedback is beneficial for secrecy, but this statement remains true even if the feedback 
is known to the eavesdropper. This suggests that the need for an advantage over the 
eavesdropper at the physical layer highlighted in Chapter 3 was largely the consequence 
of the restrictions imposed on the coding schemes. 

Second, secret-key agreement seems much more practical than wiretap coding at this 
point. The sequential key-distillation strategies based on advantage distillation, infor- 
mation reconciliation, and privacy amplification described in Section 4.3 handle the 
reliability and secrecy requirements independently, which leads to effective ways of 
distilling secret keys. Nevertheless, note that the fundamental limits of secret-key agree- 
ment from source models or channel models are not as well understood as those of secure 
communication over wiretap channels. This state of affairs is partially explained by the 
fact that two-way communications seem to be an essential ingredient of key-distillation 
strategies, and they are significantly harder to analyze than one-way communications. 
One should also note the tight connection between key distillation and source coding 
with side information. 

Finally, the study of secret-key agreement allows us to develop a generic mathemat- 
ical procedure by which to strengthen secrecy results, which justifies a posteriori the 
relevance of the fundamental limits derived with a weak secrecy criterion in Chapter 3 
and the first sections of this chapter. In some sense, strong secrecy comes “for free” but 
the reader should keep in mind that, although the fundamental limits of secure com- 
munication and secret-key generation remain the same, the coding schemes achieving 
strong secrecy might be quite different from those achieving weak secrecy. 


Appendix 


Proof of Lemma 4.1 

Proof. The result will follow from Chebyshev’s inequality, provided that we first establish 
an upper bound for Var (pps, (1)) = Es, |(pxs, (1))”] — Es, [(pxs, (D)] ? By virtue of 
the definition of Ks,, we can write px, (1) as 


rx, Y= ` oe aes") = D, 
x” eT? (X) 


where x, is the key-distillation function used by Alice in the strategy S„. Hence, 


is, [PKs,()] Y PO") Be fiice]. 


x"€T"(X) et 

=y 2) 
poo PIE = PA 
1 
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Similarly, 


is, [(Pxis, 7] 


2 
: px x”) 
= Es, 1(ka(x") = 1) 
„2 PET 1] 
; pea") \? 
<Es,| ` (eo 2) 1(ka(x") = 1) 
x"€T."(X) 
+S ESP e= Dae) 
x" €T."(X) x" ET? (X) 
Hence, 


is, (PKs, (1))”] 


ñ 2 
< ` (e) is, [1 (ka(x") = 1)] 
x" €T."(X) 


+> Peele Pe). [i6 = 1) (cal) = 1)]. 


rN s 2 
x" ET (X) x ET" (X) P[s = 1] 


Using the AEP, we bound the first term on the right-hand side as 


2 —n(H(X)—6(e)) 
x(x”) n 27” 
5 (ee i ) Zs, [1(kalx") = 1)] < 
x" €T,"(X) 


P[E = 1] [2R] (1 — 8.(n)) 


Similarly, we bound the second term on the right-hand side as 


n( ") n( m) + n m 1 i 
o Es, [1a = 1) 1 (ka) = 1)] = (a) . 


x” eT? (X) x"ET2(X) 
Therefore, we obtain the following bound for Var ( PKs, (1)): 

1 27-7lOd-s(e) 
SR 1— ôe(n) 


Var (pks, (1)) = Es, [(PKs,(0)?] — Es, Zs, [Pk 0D < 


By virtue of Chebyshev’s inequality, 


V: 1 
Ps, [|pKs, (D) — | en) < pal ) 


1 2—n( HO) R—5()) 
< 
Se 1 —6-(n) 


If R < H(X) — 6(€) then Ps, | 


Prs, (1) — 2-"8| > €2-"*] < 6,(n). 
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Proof of Lemma 4.3 
Proof. Lett r + 1. We show that Py[x(U) = 0] < 27” by developing an upper bound 
as follows: 


Pulx(W) = 0] = PulHle($) — logi] — 2t — H.(SIU) > 0] 
= Pu[H.(S) + log pu (u) — H.(S|U) — £ — log pulu) — log] 


—t>0 


< Pu[H.(S) + log pu(u) — H.(S|U) — t > 0] 
+ Pul-log pu(u) — logu] — t > 0]. 
We introduce the random variables 
Xy = 2Vsru(D-E(SIDHES and Yy Ê —log|U| —log pu(U) 
so that we obtain 
Pulx(U) = 0] < P[Xu > 2] + P[Yu > t], (4.76) 


and we upper bound each term on the right-hand side of (4.76) separately. 
First, note that 


J= 
P[Yu > t] = P [puw < a = SY pu) <2". (4.77) 
| | ucl: py(u)<2-*/|U| 


Next, we develop an upper bound P [Xu > 2! ] . Since Xy is positive, we establish it by 
upper bounding Eu [Xu] and using Markov’s inequality. We start by writing the collision 
entropy H.(SU) as 


H.(SU) = —log (= 2 suk D) 


seS ucsU 
2 
= —log (= (puu)? X (psiu(slu)) ) . 
ucU seS 
Note that 
2 ai =u u 
D (suei) = 2-5) and pu(u) = 257; 
seS 
therefore, 


H,(SU) = —log (z T f 


u cU 


which we can rewrite as 


2-H(SU) — 


Vy [208 la . 
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Hence, 


zu [Xu] = 2 He(SU)+He(S) 


The reader can easily check that He(S) < H.(SU) and, therefore, Ey[Xu] < 1. Hence, 
by virtue of Markov’s inequality, 


Aufl 
ulu] Z a~, (4.78) 
2t 


On substituting (4.77) and (4.78) into (4.76), we finally obtain 


Pu [Xu > 2] < 


Puly(U) = 0] <2%+2%=2 6) = 2, 


Proof of Lemma 4.4 
Proof. Using a strengthened version of Theorem 2.1 and Corollary 2.1 [6] we obtain 


-v7 2v7 


P|Z" e 72] 21- and P|(X", Z") € Ti(XZ)] > 1 


for n sufficiently large. Hence, by the union bound, P[O = 1] > 1 — 27v". In addition, 
for z” € T” (Z), Theorem 2.2 guarantees that Px» z» [x” € Te(XZ\z")|Z" = Z| >1- 
aN", 

To obtain the remaining part of the lemma, it suffices to establish a lower bound for the 
min-entropy H(X” |Z” = z”, © = 1) with z” € T,"(Z) because Proposition 4.7 already 
proves that 


H(X” |Z” = z”, © = 1) > H, (X"|Z" = 2",O = 1). 
By definition, 
Hoy(X"|Z" = z”,0 = 1)= —log max Pyn zO” |z”, 1). 


By Bayes’ rule, we obtain for all x” € ¥” 
P[O = 1|X" = x”, Z” = z" | pxnjzn(x"|2z") 
P[O = 1|Z" =z"] 
_ P[O = 1|X” = x”, Z” = 2” | pxnjzn(x"|z”) 
P[X* € Ti (XZ|2")|2" = 2" 
9 —n(H(X|Z)—4(€)) 
1-—6.(n) ’ 


Pym znQQX"|2", 1) = 


~ 


where the last inequality follows from pxmjz»(x"|z") < 2” ®EXIZ-80) by Theorem 2.1 
if P[O = 1|X" =x", Z" = z"] > 0. Hence, 


Hoo(X"|Z” = z”, @ = 1) > n(H(X|Z) — 6(€)) + log(1 — 8.(n)) 
= n(H(X|Z) — 5(€)) — 6-(”). 
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Proof of Lemma 4.5 
Proof. We show that Pu[x(U) = 0] < 2~”. Note that H,,.(S|U = u) satisfies 


Hoo(S|U = u) = —log max psju(s|w) 
= —log max Psuls, u) + log pu(u) 
> —log max ps(s) + log pu(u) 


= H.(S) + log pu(u), 


where the inequality follows because Y(s, u) € S x U psu(s, u) < ps(s). Hence, 


Pulx(U) = 0] = PulHo(S) > Hoo(S|U = u) +r + log|t¢|] 
< Pllog pu(U) < —r — log|t/|] 


2-7 
[puw < 7 


> pulu) 


ucU:pulu)<27" /|U| 
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5.1 


Security limits of Gaussian and 
wireless channels 


This chapter extends the results obtained in Chapter 3 and Chapter 4 for discrete mem- 
oryless channels and sources to Gaussian channels and wireless channels, for which 
numerical applications provide insight beyond that of the general formula in Theo- 
rem 3.3. Gaussian channels are of particular importance, not only because the secrecy 
capacity admits a simple, intuitive, and easily computable expression but also because 
they provide a reasonable approximation of the physical layer encountered in many 
practical systems. The analysis of Gaussian channels also lays the foundations for the 
study of wireless channels. 

The application of physical-layer security paradigms to wireless channels is perhaps 
one of the most promising research directions in physical-layer security. While wireline 
systems offer some security, because the transmission medium is confined, wireless sys- 
tems are intrinsically susceptible to eavesdropping since all transmissions are broadcast 
over the air and overheard by neighboring devices. Other users can be viewed as potential 
eavesdroppers if they are not the intended recipients of a message. However, as seen 
in earlier chapters, the randomness present at the physical layer can be harnessed to 
provide security, and randomness is a resource that abounds in a wireless medium. For 
instance, we show that fading can be exploited opportunistically to guarantee secrecy 
even if an eavesdropper obtains on average a higher signal-to-noise ratio than a legitimate 
receiver. 

We start this chapter with a detailed study of Gaussian channels and sources, includ- 
ing multiple-input multiple-output channels (Section 5.1.2). We then move on to wire- 
less channels, and we analyze the fundamental limits of secure communications for 
ergodic fading (Section 5.2.1), block fading (Section 5.2.2), and quasi-static fading 
(Section 5.2.3). 


Gaussian channels and sources 


Gaussian broadcast channel with confidential messages 


Communication over a (real) Gaussian broadcast channel with confidential messages 
(Gaussian BCC for short) is illustrated in Figure 5.1. This channel model is a specific 
instance of a BCC in which the codewords transmitted by Alice are corrupted by 
additive Gaussian noise. Specifically, the relationships between the inputs and outputs 
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Figure 5.1 Communication over a Gaussian BCC. 
of the channel are given by 
Yi =Xi+Nm; and Z;=X;+Nei, 
where the noise processes {Ni fist and {Neihis are i.i.d. and 
Nm ~N(0,07) and Ne; ~N(0,02). 


The statistics of Nm; and Ne,; are assumed known to the transmitter, the receiver, and 
the eavesdropper prior to transmission. The input of the channel is also subject to an 
average power constraint 


-X E[X?] <P 


Definitions 3.6 and 3.7 for codes, achievable rates, and the secrecy capacity then apply 
readily to the Gaussian BCC. The key property of the Gaussian BCC that makes it 
more amenable to study than the general BCC is that either the eavesdropper’s channel 
is stochastically degraded with respect to the main channel or the main channel is 
stochastically degraded with respect to the eavesdropper’s channel. In fact, if o? > 
oĉ, the marginal probabilities py,x and pz)x are the same as those of the channel 
characterized by 


Y¥;=XitNmi and Z;=Y;+N, with Nj ~N(0, 02 —o2). 


Similarly, if o? < on , the marginal probabilities py;x and pz)x are the same as those of 
the channel characterized by 


Zi;=Xi+Ne; and ¥;=Z;+N/ — with N? ~ N(0, 0f — o2). 
In the latter case, the secrecy capacity is zero according to Proposition 3.4. 


Theorem 5.1 (Liang et al.). The secrecy-capacity region of the Gaussian BCC is 


B)P\ 1 Q = B)P 
B< min( zehi ayar) al t arpa) 
= U 4 RRi: 


1 B 1 BP\\t 
0,1 
Be(0,1] Ri S (; toe(1 + £) ~ 5 toe(1 + 2) 


(ae 
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Proof. If we treat the Gaussian BCC as a special case of the BCC studied in Section 3.5, 
then the achievability of C®°° follows from Theorem 3.3 with the following choice of 
random variables: 


U~N(0,(1—f)P), V2U+X’ with X’~N(0O,6P) and X#V. 


To make the proof rigorous, we would need to modify the proof of Section 3.5.2 appro- 
priately to take into account the continuous nature of the Gaussian BCC and the power 
constraint. This can be done by noting that strongly typical sequences can be replaced 
by weakly typical sequences in the random-coding argument to handle continuous dis- 
tributions.! Then, the input power constraint can be dealt with by introducing an error 
event that accounts for the violation of the constraint as done in [3, Chapter 9]. 

For the converse part of the proof, note that all the steps in Section 3.5.3 up to (3.57) 
involve “basic” properties of mutual information (the chain rule and positivity) that hold 
irrespective of the continuous or discrete nature of the channel. Therefore, if a rate pair 
(Ro, Ri) is achievable for the Gaussian BCC, it must hold for any € > 0 that 


Pa <n (2 yaa Mp2 YS Y;) FAM Mab'V"1.2)) +8) 
Ri< -D (I(Mı; Y;|MoY¥'~!Z'*") — I(M1; Z;|MoY'~'Z'*1)) + 8(6), (5.1) 
n i= 


where we have used Y'—! £ (Y,... Y;_1), and i= (Zi41-..Z,). Next, we introduce 
the random variables U; £ yi- ži Mo and V; = U;My,. One can verify that the joint 
distribution of U;, V;, X;, Y;, and Z; satisfies 
YV(u, v, x, y, Z EUXVXÆXYXxXZ 
Pu: (upv u; (Ulu) Px; 'v; œlv)pyzix, 21x), 
where pyz)x are the transition probabilities of the Gaussian BCC. On substituting these 
random variables into (5.1), we obtain 


tain (2 Siuvo; lS wuz) +86 (5.2) 
i=l 


Ri < EYO AVi YiU) — IV; ZU) + 56). (5.3) 
m i=l 


It remains to upper bound S pi ei (5.3) with terms that depend on the power constraint 
P. We first assume that o2 < oĉ so that the eavesdropper’s channel is stochastically 
degraded with respect to the main channel. We expand (1/7) 5~"_, I(U;; Y;) in terms of 
the differential entropy as 


1 n 1 n 1 n 
-D UY) = = STH) - -9 hU (5.4) 
i=1 i=l i=l 


' Note that the use of weakly typical sequences limits us to bounds on the probability of error of the form 
P.(C,) < 6(€) instead of P.(C,,) < ôe(n) for DMCs; however, this has no effect on the secrecy-capacity 
region. 
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and we bound each sum separately. Notice that E[Y7] = E [X?] + oå since Y; = X; + 
Nin; and X; is independent of N,,,;. The differential entropy of Y; is upper bounded by 
the entropy of a Gaussian random variable with the same variance; therefore, 


1 n 1 n 1 , 
DD < =o z log (2me( b[X?] + o). 


Since x +> log(2xex) is a concave function of x, we have, by application of Jensen’s 
inequality, 


1 n 1 RE 2 1 1 n ier ; 
2 5 log (2xe( 2 [X7] +o,,)) < sloe(2xe( 15> [x7] +02) 


i=1 


On setting Q = (1/n))>7_, E[X?], we finally obtain 


PILE 5 log (27e(O +0,,)). (5.5) 
To bound the second sum (1/n)}~_, h(Y;|U;), notice that 
Dvw; PILE og(2xre(Q + o2). 
Moreover, because U; —> X; —> Y;Z; forms a Markov chain, we have 
Sorou > “ 2 hy;|X;U,;) = g Yaa = 5 log(2r02). 


Since x > 4} log(2re(xQ + øoż)) is a continuous function on the interval [0, 1], the 
intermediate-value theorem ensures the existence of 6 € [0, 1] such that 


i 1 A 
— > ACY IU) = 5 log(2re(BO + om). (5.6) 
i=l 
On substituting (5.5) and (5.6) into (5.4), we obtain 


g 2, I(U;; Y;) < : log(2xre(Q +0,.)) — e log(2re(6Q + o,.)) 


1 1—- 
= plog(1+ —P2), (5.7) 


m 


We now need to upper bound (1/7)5~7_, I(Z;; U;). If we follow the same steps as above 
with Z; in place of Y;, we obtain the upper bound 


LYU; Z) < = log (1 + 102) 

n 2 on 
with 8’ € [0, 1]. Unfortunately, this upper bound is not really useful because A’ is a priori 
different from £; we need an alternative technique to show that (1/n)>~/_, I(U;; Z;) can 
be upper bounded with the same parameters B and Q as (1/n))~"_, I(U;; Y;). The key 
tool that allows us to do so is the entropy—power inequality introduced in Lemma 2.14. 


5.1 Gaussian channels and sources 181 


Note that we can repeat the steps leading to (5.5) with Z; in place of Y; to obtain 
1 n 1 j 
-> bZ) < 5 log(2re(Q + og); (5.8) 
i=l 


therefore, we need to develop a lower bound for (1/n)>~/_, lh(Z;|U;) as a function of 8 
and Q. Since we have assumed that the eavesdropper’s channel is stochastically degraded 
with respect to the main channel, we can write Z; = Y; + N; with N; ~ N (0, o — oå). 
Applying the entropy—power inequality to the random variable Z; conditioned on U; = 


ui, we have 


h(Z;|U; = uj) = h(Y; + N; JU; = ui) 


1 _ ie 
3 log 2 qe) 


= Hog (2264) + 2re(o2 — 02)). 


N 


Therefore, 


1 n 1 n 
— > bu) = —) Eu [h(Z,/U,)] 
i=1 


n i=l 
Ah lU; 
> = 3 ü [log(2 Mu» + 2ne(o; = o.,)) | 
(a ae ilU; 
2 z e u, hU] + 2ne(o; = on) 
i< |U; 
_ an 2,182" MU) 4. 27re(o? ELA) 
> log (220 2 MOM + 2re(o? — on) 
ie 5 log(2re(p0 +o,) + 2me(o; — oż)) 
1 


=a log(22e(BO + -)), (5.9) 


where both (a) and (b) follow from the convexity of the function x > log(2* + c) for 
c € R, and Jensen’s inequality while (c) follows from (5.6). Hence, 


1 n 
- D OZ) -KZU 


i=1 


1 n 
go) 


/N 


log(2xe(Q + o.)) — log (27 e(BO + o:)) 


log(1 + ce). (5.10) 


2 
Oe 


NI= Nie 
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where the inequality follows from (5.8) and (5.9). On substituting (5.7) and (5.10) 
into (5.2), we obtain 


Ry < min(; toe(1 + €82), > e(1 + a) +(e). (5.11) 
2 o2 2 o2 


e 


We now develop an upper bound for R; as a function of the same parameters Q and £ 
starting from (5.3). First, we eliminate the auxiliary random variable V; by introducing 
the random variable X; as follows: 


1 n 
— XO Vi; YAU») — (Vi; ZU) 
n 

i=1 


1 n 
= XO AV;X; YiU) — 10%); YiU Vi) — ICV; X; ZAU) + IX; Z UV) 
n 

i=l 


le 
eL 5 (U(X; YUD — WX; ZU) — 1X); YiU; Vi) + IX; ZU; Vi) 
n 
i=1 


1 n 
= Ss ( (Xj; YAU) — 1%); ZU) — (Xi; Yi ZU; Vi) + I(X;; Z: 1Y Ui Vi) 
n 
i=l 
+ (Xi; Z;|Ui Vi) 


1 n 
eg = X AX; YUD — X;; ZU; — I(X;; Y: ZU; Vi) + (X; ZU Vi) 
n 
i=l 


INS 


1 n 
= > AX; YU») = 1X; ZU), 
i=1 


where (a) follows from I(V;; Z;|U;X;) = I(V;; Y;|U;X;) = 0 since U; > V; > Xi > 
Y,Z; forms a Markov chain, (b) follows from 1(X;;Z;|U;V;Y;) = 0 since Z; is 
stochastically degraded with respect to Y;, and (c) follows from I(X;;Z;|U;V;) < 
I(X;; Z: Y;|U; V;). Next, we use (5.6) and (5.9) to introduce £ and Q as follows: 


1 n 
= > AX; YiU) = IX; Z:1U;)) 
i=l 


= So OYU) -hX U) — h (ZU; + (ZX U;)) 


an 
i=l 
1 2 1 2 
< 5 log(22e(BO + o,,)) -3 log (270, ) 


- 5 log(2e(6O +o2)) + 5 log(2re0?) 


l BOQ l BOQ 
= 5 log (14 ES) — tog (1+ 2). (5.12) 


e 
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By substituting (5.12) into (5.3), we obtain the desired upper bound for R: 
1 1 
Ri < -log | 1+ PQ —x=log({ 1+ 3 + 6(€). (5.13) 
2 o2 2 


If o? < then the main channel is stochastically degraded with respect to the 
eavesdropper’s channel and R; = 0 by virtue of Proposition 3.4. By swapping the roles 
of Y; and Z; in the proof, the reader can verify that (5.11) still holds. We combine the 
two cases o < oå and of > oĉ by writing 


Ro < min (tog (1+4 =P) Zio e(14 = P2 =e) + 6(€), 
2 On 2 o2 


Rı S (5 toe (1+ £2) = tog (1 + Be" + 6(€). 


To conclude the proof, notice that 


Q + min (tog (14+ SPE), tog (1 + PL =e) 
2 oA 2 o? 
Q > (5 toe (1 +22) ~ Fog (1+ 22)" 


are increasing functions of Q and, by definition, Q = (1/n))~7_, E[X?] < P. Addi- 
tionally, € can be chosen arbitrarily small; therefore, it must hold that 


Ro < min (= tog (1+ =P" -PPN jrg (14 Z2), 
2 o2 2 o? 


1 BP 1 BP\\* 
Ri <S (; 108 (1+ - 5 tog (1+ 2) ) : 


In contrast to the general BCC, the capacity region of the Gaussian BCC does not 
require the introduction of a prefix channel. In fact, the proof of Theorem 5.1 shows 
that the choice X = V is optimal. The typical shape of the region C°°°° is illustrated in 
Figure 5.2, together with the capacity region of the same Gaussian broadcast channel 
without confidential messages. It may seem that communicating securely inflicts a strong 
rate penalty and that a significant portion of the available capacity has to be sacrificed 
to confuse the eavesdropper; however, this is again somewhat misleading because the 
achievability proof shows that it is possible to transmit an additional individual message 
to the legitimate receiver. On specializing the results of Section 3.6.1 and assuming 
o? < oł, we see that it is actually possible to transmit three messages over a Gaussian 


e 
broadcast channel with confidential messages: 


o > 


and 


(1) acommon message to both Bob and Eve at rate 


_ (1 (-A)P) 1 -AP 
Ro = min (žre (1+) , 78 (14 cree) 
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Figure 5.2 Secrecy-capacity region and capacity region of a Gaussian broadcast channel with 
Om = 0.8, os = 1, and P = 1. The light gray region is the secrecy-capacity region, whereas the 
darker gray region is the capacity region without secrecy constraints. 


(2) aconfidential message to Bob at rate 
1 BP 1 BP 
Ri = =1 1+ —)-<=1 1+— l]; 
i zs (1457) zoe (1+ 55): 


(3) a public message to Bob with no guaranteed secrecy at rate 


1 BP 
Rag=-1 1+ —~ }. 
i 7 ( +t) 


Without any common message sent to Eve (6 = 1 and Ry = 0), the total rate effec- 
tively available to communicate with Bob is Rj + Ra = 4 log(1 + P/o,), which is the 
capacity of the main channel. 

The secrecy capacity of the Gaussian WTC is obtained by specializing Theorem 5.1 
to B = 1 (Ro = 0). 


Corollary 5.1 (Leung- Yan-Cheong and Hellman). The secrecy capacity of the Gaussian 


WTC is 
1 P 1 PANTA 7 
c= (Flo (1+ 5) - 5108 (14+ 5) = (Cm — Ce)", 


where Cm = 5 log (1 + P/oz) is the capacity of the main channel and C; = 1 + P /o2 
is that of the eavesdropper s channel. 
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The expression for the secrecy capacity of the Gaussian WTC implies that secure 
communication is possible if and only if the legitimate receiver has a better signal-to- 
noise ratio (SNR) than that of the eavesdropper. In practice, this is likely to happen if the 
eavesdropper is located farther away from the transmitter than the legitimate receiver 
and receives attenuated signals. Near-field communication is a good example of such a 
situation, but this requires the eavesdropper to have a disadvantage at the physical layer 
itself. Also notice that, unlike the channel capacity, the secrecy capacity does not grow 
unbounded as P — oo. Taking the limit in Corollary 5.1, we obtain 


. 1 o2\\* 
jace- (31s (75) - 


Therefore, increasing the power results in only marginal secrecy gains beyond a certain 
point. 


Remark 5.1. AlI of the results above extend to the complex Gaussian WTC, for which 
the noise sources are complex and circularly symmetric, that is Nm, ~ CN (0,02) and 
Ne ~ CN(0, 02), and can account for constant (and known) multiplicative coefficients 
hm € Cand he € C in the main channel and in the eavesdropper s channel, respectively. 
By noting that a complex Gaussian WTC is equivalent to two parallel real Gaussian 
WTCs with power constraint P /2 (and half the noise variance), and that a multiplicative 
coefficient induces a scaling of the received SNR, the secrecy capacity follows directly 
from the previous analysis and we have 


lhml? P Ih P\\ 
C; = | logg | 1+ 3 — log | 1+ 3 3 
On Og 


Remark 5.2. Suppose that the eavesdropper 5 noise is known to be Gaussian, but the 


variance is known only to satisfy o2 > o for some fixed o3. One can check that a 


set of Gaussian channels with noise variance oè > oẹ forms a class of stochastically 


e 
degraded channels, as introduced in Definition 3.10. Proposition 3.3 guarantees that a 
wiretap code designed for an eavesdropper ’s noise variance og will also ensure secrecy 


if the actual variance is 02 > oĉ. 


Multiple-input multiple-output Gaussian wiretap channel 


Generalizing the results obtained in the previous section to a multiple-input multiple- 
output (MIMO) situation is not merely useful for the sake of completeness; it also allows 
us to study the effect of spatial dimensionality and collusion of eavesdroppers on secure 
communications rates. The MIMO wiretap channel? is illustrated in Figure 5.3. The 
numbers of antennas used by the transmitter, receiver, and eavesdropper are denoted 
Nt, Nr, and Ne, respectively. Notice that the model does not distinguish between a single 
eavesdropper with multiple antennas and a set of multiple eavesdroppers who collude 


2 This model is also called the multiple-input multiple-output multiple-eavesdropper (MIMOME) channel to 
emphasize that all parties have multiple antennas. 
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Figure 5.3 Communication over a MIMO Gaussian WTC. 


and process their measurements jointly. In practice, there is a physical limit to the number 
ofuseful collocated antennas that one can deploy; therefore, a collusion of eavesdroppers 
is likely to be more powerful than a single eavesdropper with multiple antennas. 

For a Gaussian MIMO wiretap channel (Gaussian MIMO WTC for short), the rela- 
tionships between the inputs and outputs of the channel at each time i are 


Y = HmX;'+ Nm; and ZP = HeX; + Nei, 


where X?" e C™*! is the channel input vector, Hm € C™*™ and He € C”:*” are deter- 
ministic complex matrices, Y;" € C™*! is the legitimate receiver’s observation vector, 
and Z’: e C":*! is the eavesdropper’s observation vector. The channel matrices Hm and 
H. are fixed for the entire transmission and known to all three terminals. The noise 
processes {Nm i}is1 and {Ne i}i>1 are i.i.d.; at each time i the vectors Nmi € C"*! and 
N.; € C"™! are circularly symmetric complex Gaussian random vectors with covari- 
ance matrices Km = 021, and Ke = ofl., respectively, where I, is the identity matrix 
of dimension n. The channel input is also subject to the long-term average power con- 


straint (I/n) S07, E|||X"]] < P. 


Theorem 5.2 (Khisti and Wornell, Oggier and Hassibi, Liu and Shamai). The secrecy 
capacity of the Gaussian MIMO WTC is 


where the maximization is over all positive semi-definite matrices Kx such that 
tr(Kx) < P. 


1 
I, + HmKxH}, 
O, 


m 


1 
I,, + —HeKxH} 
D, 


e 


C¥™° = max (ie — log 


The expression for C¥™° is the natural generalization of the scalar case obtained 
in Corollary 5.1 which we could have expected; however, the proof of this result is 
significantly more involved. On the one hand, the achievability of rates below C\'"° 
follows from Corollary 3.4, which can be shown to hold for continuous vector channels 
with multiple inputs and outputs. Choosing an n; x n, positive semi-definite matrix K 
such that tr(K) = P and substituting the random variables 


V~CN(0,K) and X£V (5.14) 
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into Corollary 3.4 yields the desired result. On the other hand, proving that the choice 
of random variables in (5.14) is optimal is arduous because, in general and in contrast 
to the scalar Gaussian WTC, the eavesdropper’s channel is not stochastically degraded 
with respect to the main channel. We refer the reader to the bibliographical notes at the 
end of this chapter for references to various proofs. 

The maximization over covariance matrices subject to a trace constraint in the expres- 
sion for C¥™° makes it difficult to develop much intuition from Theorem 5.2 directly. 
Nevertheless, it is possible to develop a necessary and sufficient condition for CY""° = 0 
that admits a more intuitive interpretation. 


Proposition 5.1 (Khisti and Wornell). The secrecy capacity of the Gaussian MIMO 
WTC is zero if and only if max Hw, He) < 1, where 


Ge ||Hmvl 


Amax(Hw, He) = sup 
vec Om ||HevIl 

Sketch of proof. The kernel of a matrix H is Ker(H) £ {v : Hv = 0}. If Ker(H,) N 
Ker(H,,)+ Æ Ø, there exists a vector v such that ||H,,v|| > 0 and ||H,v|| = 0. In this 
case, Amax is undefined and the transmitter can communicate securely by beamforming 
his signal in the direction of v, which is unheard by the eavesdropper. Notice that 
this strategy does not require a wiretap code and beamforming is sufficient to secure 
communications. 

If Ker(H.) N Ker(Hm)+ = Ø, then beamforming is not sufficient to secure communi- 
cations. Nevertheless, if Amax(Hw, He) > 1, then there exists v with ||v|| = 1 such that 
|Hinv|l/om > ||Hev||/oe; in other words, even though the eavesdropper overhears all 
signals, there exists (at least) one direction in which the legitimate receiver benefits from 
a higher gain than the eavesdropper. Substituting the random variables 


V ~CN(0, Pw') and X2V 


into Corollary 3.4 shows that 


P iat 
L, + Hew! Ht 
(oy 


e 


P 
I,, + —Hmvv Hi, 
O, 


m 


C; > log 


— log (5.15) 


Using the identity log|I + AB| = log|I + BA], we can rewrite (5.15) as 


P 2 P 2 
C; > logg | 1+ = ||Hmvil* } — log {| 1+ = ||Hevil" }. 
oO (oy 


and the right-hand side is strictly positive since ||HmvV||/om > ||Hev||/ce. 

If Amax(Hw, He) < 1, it is also possible to show that C¥™° = 0. The proof hinges 
on a closed-form expression for C° = 0 in the high-SNR regime obtained using a 
generalized singular-value decomposition of Hm and He. We refer the reader to [83, 84] 
for details of the proof. 


As expected, Proposition 5.1 confirms that secure communication is possible if the 
transmitter can beamform his signals in such a direction that the eavesdropper does 
not overhear. Perhaps more interestingly, Proposition 5.1 also shows that the secrecy 
capacity is strictly positive as long as the transmitter can beamform his signals in a 
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Figure 5.4 Condition for zero secrecy capacity in the limit of a large number of antennas. The 
elements of Hm and H, are assumed to be generated i.i.d. according to the distribution CM (0, 1) 
and om = Ce = 1. 


direction for which the eavesdropper obtains a lower SNR than does the legitimate 
receiver. In other words, the combination of coding and beamforming is more powerful 
than beamforming alone. Without additional assumptions about the specific structure of 
Hn and H,, little can be said regarding the existence of a secure beamforming direction. 
Nevertheless, if the entries of Hm and He are generated i.i.d. according to CM (0, 1) and 
if their realizations are known to Alice, Bob, and Eve, the behavior of C° can be 
further analyzed using tools from random-matrix theory. 


Proposition 5.2 (Khisti and Wornell). Suppose that Om = Ce = 1 and n,,m, and ne 
go to infinity while the ratios æ =n,/n, and B = n/n, are kept fixed. Then, the 
secrecy capacity converges almost surely to zero if and only if 


1 2 
0<B<5, O<a<1, and a < (1- v28) 


The proof of Proposition 5.2 can be found in [83, 84]. Proposition 5.2 allows us 
to relate the possibility of secure communication directly to the number of antennas 
deployed by Alice, Bob, and Eve. As expected, and as illustrated in Figure 5.4, the 
secrecy capacity is positive as long as Eve does not deploy too many antennas compared 
with the numbers deployed by Alice and Bob. For instance, if œ = 0, which corresponds 
to a single receive antenna for Bob, the secrecy capacity is positive if Eve has fewer 
than twice as many antennas as Alice. For 8 = 0, which corresponds to a single transmit 
antenna for Alice, the secrecy capacity is positive provided that Eve has fewer antennas 
than Bob. This leads to the pessimistic conclusion that little can be done against an all- 
powerful eavesdropper who is able to deploy many antennas; however, one can perhaps 
draw a more optimistic conclusion and argue that Alice and Bob can mitigate the impact 
of colluding eavesdroppers by deploying multiple transmit and receive antennas. 
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We conclude this section on the MIMO Gaussian WTC with a brief discussion of a 
suboptimal strategy that sheds light on the choice of the input covariance matrix Kx in 
Theorem 5.2. Let r = rk(Hm) and consider the compact singular-value decomposition 
Hn = UnAmVis in which Um € C™*” and Vm € C”*" have unitary columns, and 


Am E C” is a diagonal matrix with non-zero diagonal terms. We construct a unitary 
matrix V € C”*", by appending appropriate column vectors to Vm, and we let 


V=2(Vm Va) with Wm =(w...v-) and Vê (vri ka Nu): 


This decomposition allows us to interpret the channel to the legitimate receiver as n 
parallel channels, of which only the first r can effectively be used for communication. 


Alice simultaneously transmits r symbols b1, ..., b, by sending the vector 
r nt 
x= S > div; + 5 njVj, 
j=l j=r+1 


where {n j ae „1 are dummy noise symbols. These dummy symbols do not affect 
Bob’s received signal because they lie in the kernel of Hm, but they are mixed with 
the useful symbols in Eve’s received signal. This scheme is called an artificial noise 
transmission strategy since it consists essentially of transmitting information in the 
direction of the non-zero singular values of Hm, and sending noise in all other direc- 
tions to harm the eavesdropper. A simple way of ensuring that the power constraint 
(1/n)>o7_, ELIX ” < P is satisfied is to allocate the same average power P /n to all 
n; sub-channels. In this case, achievable secure communication rates are given by the 
following proposition. 


Proposition 5.3 (Khisti et al.). The artificial noise transmission strategy achieves all 
the secure rates R, such that 


R; < log + log 


P = 
Hin.) Vin 


oén 


P i 
I, + a 


vi (1+ 


Proof. The secure communication rates are obtained by substituting the random vari- 
ables 


n 


vå yB and X=V+ S° Nyy 


j=1 j=r+1 
into Corollary 3.4, where the random variables {B Dia and {N ie 4) are iid. and 
drawn according to CM (0, P/n). Then, 
ICV; Y) = h(Y) — h(Y|X) (5.16) 
dp tat 2 
= log |o,1 + —HmVVÝHİ, | — logo 41] 
nt 
P 
= log |I + =—UnAnV1,VV' VALU, 
Ont 
P 
= log |I+ S—AmA}], (5.17) 
Ont 
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where we have used Vi Vm = L, Ut Um = I, and log|I + AB| = log|I + BA| to obtain 
the last equality. Similarly, 


I(V; Z) = h(Z) — h(ZIV) 


P P 
= log|o21 + —H.VV' ni| — log |o21 + —H. V, Vİ Hİ 
Nt nt 
P i P vvint 
= log |I + —— H.H} | — log |I + VaV HEH. 
ogn o2n 


Since V is unitary, VaV} + VmVi, = I; hence, 


log 


P 
I+ — VVI HIH. 
O< Nt 


e 


= log 


P bpi 
I+ 5 (i — Vn VHH, 
orn 


P wi P Daai o 
= log| (1+ ——HIH, ) [I I+ HİH. ) VnVUHIH, ||. 
ofn on ofn 
Therefore, 
P P = 
I(V; Z) = —log |I- — (1+ 5 HH.) Vm Vi HIH. 
ont ont 


P 
= —log |I — ——Vi.HIH, (1+ 
nt 


e 


P -l1 
HLH.) Vin 


2 
orn 


P -l 
= —log |V}, (1 + HIE.) Vin|- (5.18) 
orn 


By Corollary 3.4, all the rates Rs < I(V; Y) — I(V; Z) with I(V; Y) given by (5.17) 
and I(V; Z) given by (5.18) are achievable. 


The idea of introducing artificial noise into the system to hinder the eavesdropper is 
a powerful concept that will reappear in Chapter 8 for multi-user secure communication 
systems. 


Remark 5.3. Although the signaling used in the artificial noise transmission strategy 
relies solely on knowledge of Hm and does not exploit knowledge of the eavesdropper 5 
channel H,, notice that knowledge of H, is required in order to design the wiretap 
code and select the secure communication rate appropriately. Hence, the artificial noise 
operates in only a semi-blind fashion. 


Gaussian source model 


A Gaussian source model for secret-key agreement consists of a memoryless source 
(XYZ, pxyz) whose components are jointly Gaussian with zero mean. The distribution 
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is entirely characterized by the second-order moments 


PX] =o}, EY] =o}, E[Z] = 0, 


[XY] = pioxoy, E[XZ] = pooxoz, E[YZ] = pjoyoz, 


where p1, 02, and p3 are the correlation coefficients of the source components. The de- 
finitions of key-distillation strategies, achievable key rates, and the secret-key capacity 
are those used for discrete memoryless sources. A closer look at the proof of the upper 
bound for the secret-key capacity in Section 4.2.1 shows that the derivation does not 
rely on the discrete nature of the source (VV Z, pxyz); however, the achievability proof 
based on a conceptual WTC relies on the crypto lemma, which does not apply to Gaussian 
random variables. Nevertheless, we show in this section that the lower bound is still valid 
for a Gaussian source model. 


Proposition 5.4. For a Gaussian source model, 


K(X; Y) — min(I(X; Z), ICY; Z)) < C$" < min(I(X; Y), I(X, Y|Z)). 


Proof. The upper bound follows from the same steps as in the proof of Theorem 4.1, 
and we need only show that the lower bound holds. To do so, we construct a conceptual 
WTC as in Section 4.2.1 but we use this time the addition over real numbers. Specifically, 
to send a symbol U e€ R independent of the DMS to Bob, Alice observes a realization 
X of the DMS and transmits U + X over the public channel, where + denotes the usual 
addition over R. This creates a conceptual memoryless WTC, for which Bob observes 
(U + X, Y) and Eve observes (U + X, Z). From the results of Chapter 3, we know that 
the secrecy capacity is at least 


1(U; Y, U + X) — IU; Z, U + X), 


where the distribution pu can be chosen arbitrarily; here, we choose U ~ MN (0, P) for 
some P > 0. Using the chain rule of mutual information repeatedly, we obtain 


IU; Y, U + X) — 1(U; Z, U+ X) 
= I(U; Y) + 1(U; U + X/Y) — 1(U; Z) — ICU; U + XIZ) 
= h(U|Z) — h(uyY) + hU + XY) — hU + X/UY) — hU + XIZ) 
+h(U + X|UZ) 
2 KUIZ) — hUIY) +h (U + XIY — h(XIY) — hU + XIZ) + h(XIZ). (5.19) 
Equality (a) follows because U is independent of (X, Y), which implies that 
h(U + XIUY) = h(X|UY) = h(XIY) and h(U+ X|UZ) = h(XIZ). 
Now, 
h(UIZ) — h(U + XIZ) < hUIZ) — h (U + XIZ, X) 
= h(u|Z) — h(UIZ, X) 
= 0, 
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where the last equality follows again from the independence of U and (X, Z). Also, 


P 
h(U|Z) — h(U + XIZ) > h(U|Z) — h(U + X) > log | -— }, 
P+o2 
where the last inequality follows from h(U|Z) = h(U) = log(27eP) and the bound 
h(U + X) < log(27e(P + 02)). Because all communication takes place over a public 


noiseless channel (of infinite capacity), P can be arbitrarily large,> and, for any € > 0, 
we can choose P such that 


KUIZ) — h(U + X|Z)| < 3 (5.20) 


Repeating the same argument with Y instead of Z shows that for P large enough we 
have 


Ih(UIY) — h (U + XIYI < (5.21) 


€ 
z 
On combining (5.19), (5.20), and (5.21), we obtain 
Wu; Y, U + X) — (U; Z, U + X) > h(X|Z) — h(XIY)— € 
= I(X; Y) — I(X; Z) — €. 
Since € > 0 can be chosen arbitrarily small, we must have 


C™ > I(X;Y) — 1(X; Z). 


Similarly, by interchanging the roles of Y and Z, we can show that 


C™ > I(X;Y) — I(Y; Z). 


Remark 5.4. There is no loss of generality by restricting our analysis to a centered 
Gaussian source model. If X, Y, and Z have non-zero means ui, pı, and u3, one 
can simply consider the centered random variables X' £x— ui Y £y— H2, and 
Z' £ Z — u3 and note that the bounds on the secret-key capacity remain unchanged. 


The bounds given by Proposition 5.4 can be computed explicitly in terms of the 
parameters p1, 02, and p3. 


Corollary 5.2. The secret-key capacity of a Gaussian source model satisfies 


1 1—p?\ 1 1 — p? 
max | = log 2 , = log r 
2 =p 2 l= pi 
1 1 1 1 — p2)(1 — p? 
< Ge < min ( log ( z) , toe ( ( Px) 5 3) 7): 
2 l—pyj} 2 1+ 201/203 — py — P3 — P3 


3 Note that our ability to choose P as large as desired is a mathematical convenience rather than a realistic 
solution. In practice, even public communication would be subject to a power constraint and thus to a rate 
constraint. 


5.2 
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Figure 5.5 Communication over a wireless channel in the presence of an eavesdropper. 


Remark 5.5. Note that if p) > pı and p3 > pı the lower bound obtained is negative 
and not really useful. Nevertheless, it is sometimes still possible to obtain a positive 
secret-key rate using an advantage-distillation protocol, as discussed in Section 4.3.1. 


Example 5.1. An interesting instance of a Gaussian source model is one in which 
X ~ N (0, P) is a Gaussian random variable transmitted over a Gaussian WTC, such 
that 


Y=X+N» and Z=X+N. 


with Nm ~ M(0, o2) and Ne ~ M (0, o2). In this case, Corollary 4.1 and Proposition 5.4 
apply and C$” = I(X; Y) — I(Y; Z), which can be computed explicitly as 


1 Po? 
Co” = -log | 1 + ——— }. 
= ple (1+ oon) 
Note that CS" is positive even ifo? < o,. In contrast, the secrecy capacity of the same 
Gaussian WTC given in Theorem 5.1 is 


1 P 1 P\\t 
C= z 108 ere — 5 los ee 5 


which is zero if of < o,. Hence, the impossibility of secure communication over a 
Gaussian WTC when the eavesdropper has a higher SNR than that of the legitimate 
receiver is solely due to the restrictions placed on the communication scheme. In reality, 
as long as the eavesdropper obtains a different observation, the legitimate parties always 
have an advantage over the eavesdropper, and they can distill a secret key. 


Wireless channels 


The general channel model we use to investigate secure wireless communications is 
illustrated in Figure 5.5. For simplicity, we focus on the transmission of a single secure 
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message and the characterization of the secrecy capacity, but all results described there- 
after generalize to include a common message for the legitimate receiver and the eaves- 
dropper. The communication channel between Alice and Bob is modeled as a fading 
channel, characterized by a random complex multiplicative coefficient Hm and an inde- 
pendent complex additive white Gaussian noise Nm. The coefficient Hm is called the 
fading coefficient, and accounts for the multipath interference occurring in a wireless 
transmission. The square of the magnitude of the fading coefficient, Gm 4 lHinl?, is 
called the fading gain. Similarly, the channel between Alice and the eavesdropper Eve is 
modeled as another fading channel with fading coefficient He, fading gain Ge = IHel?, 
and additive white Gaussian noise N.. In a continuous-time model, the time interval dur- 
ing which the fading coefficients remain almost constant is called a coherence interval; 
with a slight abuse of terminology, we call a realization (Am, Ae) of the fading coefficients 
a coherence interval as well. 
The relationships between inputs and outputs for each channel use i are given by 


Yi = Hm Xi + Nmi and Z; = HeiX; + Nes, 


where Hini, Nm, He, and Ne; are mutually independent. The input of the channel is 
also subject to the power constraint 


-X E[X?] <P. 


The noise processes {Nm,; },,, and {Ne,; },,, are iid. complex Gaussian with 
Nmi ~ CN(0,02) and Ne; ~ CN (0, 02). 


Different types of fading can be modeled by choosing the statistics of the fading coeffi- 
cients {Hm },., and {He,;},., appropriately. In the remainder of the section, we focus 
on three standard fading models. 


e Ergodic-fading model: this model characterizes a situation in which the duration of 
a coherence interval is on the order of the time required to send a single symbol. 
The processes {Aes a and {Hei Jai are mutually independent and i.i.d.; fading 
coefficients change at every channel use and a codeword experiences many fading 
realizations. 

e Block-fading model: in this model, a codeword experiences many fading realiza- 
tions; however, the time required to send a single symbol is much smaller than the 
duration of a coherence interval. The processes {is tisi and THe, ri are again 
mutually independent and i.i.d., but change every N channel uses; N is assumed to be 
sufficiently large for asymptotic coding results to hold within each coherence interval. 

e Quasi-static fading model: this model differs fundamentally from the previous ones 
in that fading coefficients are assumed to remain constant over the transmission of 
an entire codeword, but change independently and randomly from one codeword to 
another. The processes { Hm,i = and { He, es are mutually independent and i.id., 
characterizing a situation in which fading variations are on the order of the time 
required to send an entire codeword. 


5.2.1 
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Remark 5.6. We assume that the fading processes {Hm fis and {Hei} 
mutually independent and i.i.d to simplify the analysis; however, all results described 


are 


thereafter generalize to the situation in which the processes are correlated, stationary, 
and ergodic. 


For all three models, we illustrate the results numerically by considering the special case 
of 1.1.d. Rayleigh fading, for which {Hm,; ksi and THe, J> are mutually independent 
i.i.d. complex Gaussian processes with 


Hmi ~ CN(0,@2,) and He; ~ CM(0, a2). 


m 


; ; ‘ 2 2 : . 
In this case, the fading gains G,,; = | Hin, and Ge; = [H..;| are exponentially dis- 
tributed with respective means 


Um = = [Gm] = a and He £ E [Ge] = a 


e- 


The statistics of the noise Nm, and Ne,; are assumed known to Alice, Bob, and Eve, 
prior to transmission. Bob has at least instantaneous access to the fading coefficient Am,i 
and is able to detect symbols coherently. In addition, Eve has instantaneous access to 
both of the fading coefficients Am,; and he, so that the information leakage is always 
implicitly defined as 


L(C,) ê  I(M; Z" IH} H!C,), 
n 


where H7 and H? are the sequences of fading coefficients for the main and eavesdropper 
channel and C,, is the code used by Alice. Although providing the channel state informa- 
tion of the main channel to the eavesdropper is a pessimistic assumption, this assumption 
is required in the proofs. We will see that whether or not Alice has instantaneous access 
to the coefficients h,,,; and Ae,; has a significant impact on achievable communication 
rates, and several situations are considered in the next sections. 


Ergodic-fading channels 


For ergodic-fading channels, the processes {Hm, i Jai and THe, i en are mutually inde- 
pendent and i.i.d. We first assume that Alice, Bob, and Eve have full channel state 
information (CSI); that is, they all have access instantaneously to the realizations of 
the fading coefficients (m,i, Ae; ). In addition, a symbol sequence X” is allowed to 
experience (infinitely) many fading realizations as the blocklength n goes to infinity; the 
average power constraint (1/n))~/_, E[X?] < P is understood as a long-term constraint 
so that the power can be adjusted depending on the current fading realization. 


Theorem 5.3 (Liang et al.). With full CSI, the secrecy capacity of an ergodic fading 
wireless channel is 


Gm, Ge Gm Gm, Ge Ge 
C; = max EG„G. og (1 + y1 3 ) ) log (1 + En So), 
O, 


2 
x m Og 


where y : RÈ > R+ is subject to the constraint E[y(Gm, Ge)] < P. 
+ + J 
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Proof. The key idea for the achievability part of the proof is that knowledge of the CSI 
allows Alice, Bob, and Eve to demultiplex the ergodic-fading WTC into a set of parallel 
and time-invariant Gaussian WTCs. Specifically, this transformation can be done as 
follows. We partition the range of Gm into k intervals [gm.i, Zmi+1) with i € [1, A]. 
Similarly, we partition the range of fading gains Ge into k intervals [ge,;, e,j+1) with 
j € |L, k]. For simplicity, we first assume that the fading gains are bounded (that is, 
Zm,k+1 < CO and ge441 < 00) and let 


Pi =P [Gm € [2m,i; &mi+1)] and qj =P [Ge E [8e,j, &e,j+1)] : 


For each pair of indices (i, j), Alice and Bob publicly agree on a transmit power y;; and 
on a wiretap code C’/ of length n designed to operate on a Gaussian WTC with main 
channel gain g,,; and eavesdropper’s channel gain ge j+1. The set of transmit powers 
{vi }x.x is also chosen such that eae ae Pid Vij < P. If we define 


(ra) mag)” 


then, for any € > 0, Corollary 5.1 ensures the existence of a (2”*,n) code C} such 
that 


— , 
Ry 2 Cj—e, —LC)<d(6), and PCY) < 4(6). 
n 


Note that a Gaussian channel with gain g,,; is stochastically degraded with respect to 
any Gaussian channel with gain g € [erases 2m,i+1); therefore C} also guarantees that 
P(C} ) < 6(€) for a main channel gain g € [2m,is 2m,i+1)- Similarly, a Gaussian channel 
with gain g € |2., j» e, j+1) is stochastically degraded with respect to a Gaussian channel 
with gain ge, ;1; therefore, by Proposition 3.3, C} guarantees that L(C’/) < 6(e) for an 
eavesdropper’s channel gain g € |2., j» Ze, j+). 

Since all fading coefficients are available to the transmitter and receivers, the ergodic- 
fading WTC can be demultiplexed into k* independent, time-invariant, Gaussian WTCs. 
The set of codes {C'/};,, for the demultiplexed channels can be viewed as a single 
code C, for the ergodic-fading channel, whose rate R, is the sum of secure rates Rj; 
achieved over each channel weighted by the probability p;q; that the code CÏ} is used. 
Therefore, 


k k 
R, = XOY pia Ry 


i=l j=l 
4 Snir Be jury \\* 
Se (ioe (1+ &™ Eni) — tog (1+ E 1)) -e 
i=l j=1 Oe 
subject to the power constraint 


k k 
SoS Bi qiVij S 


i=1 j=1 
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Additionally, 


1 R Lo, 
LC) =J > pia- LCi) < O 


i=1 j=1 


and 
k k a 
P(Cn) = XL X pig P(Ci/) < 8(€). 
i=l j=1 


Note that k can be chosen arbitrarily large and € can be chosen arbitrarily small. Hence, 
the ergodicity of the channel ensures that all rates Rs such that 


Gm Gm, Ge Ge Gms Ge 7 
(ie (1 pa ’) log (1 + Srta C3) | (5.22) 
O, o, 


m e 


R; < UGmGe 


withy : R? — Rt apower-allocation function such that E[y(Gm, Ge)] < P, are achiev- 
able full secrecy rates. One can further increase the upper bound in (5.22) by optimizing 
over the power allocations y such that E[y (Gm, Ge)] < P. Note that the maximiza- 
tion over y also allows us to drop the operator (-)*, which yields the expression in 
Theorem 5.3. 

If the range of fading gain is unbounded, we define an arbitrary but finite threshold ginax 
beyond which Alice and Bob do not communicate. The multiplexing scheme developed 
above applies directly for fading gains below gmax; however, there is a rate penalty 
because P[Gm > Zmax OF Ge > Zmax] > 0 and Alice and Bob do not communicate for a 
fraction of fading realizations. Nevertheless, the penalty can be made as small as desired 
by choosing gmax large enough. 

We omit the converse part of the proof of the theorem, which can be obtained from a 
converse argument for parallel Gaussian WTCs. The ideas behind the proof are similar 
to those used in Section 3.5.3, with the necessary modifications to account for parallel 
channels. We refer the interested reader to [85] for more details. 


The power allocation y which maximizes C, in Theorem 5.3 can be characterized 
exactly. 


Proposition 5.5. The power allocation y* : R — R+ that achieves C; in Theorem 5.3 
is defined as follows: 


e ifu > 0 and v = 0, then 


e if 
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then 


f TE @ o2 o2\ (4, of o2\\ 
y*u, eo (Ca rma ara Beg T ; 
2 u v v u À u v 


e y*(u, v) = 0 otherwise; 


with à > 0 such that E[y*(Gm, Ge)] = P. 


Proof. For simplicity, we assume that the fading realizations take a finite number of 
values and we accept the fact that the proof extends to an infinite number of values. For 
any (u, v) € R? , we define the function 


u v 
fon ty > tog (1+ 2) -10g (1+ 22). 


e 


Ifu/o2 <v/o2, the function fy, takes negative values; hence, without loss of optimality, 
we can set 


UU v 
y“(u, v) =0 if = <5- 
on Og 


If u /o2 >v fog, the function fuv is concave in y; consequently, the secrecy capacity, 
which is a weighted sum of functions fuv, is concave, as well. Therefore, the optimal 
power allocation y* can be obtained by forming the Lagrangian 


LES TY iog (: - e M) po pelo) 


-E Z iog (1+ E) po poto) 


= D >, y(u, v)pG,(U)PG.(v), 


u v 


and finding y(u, v) > 0 maximizing £ for each (u, v) such that u/o2, > v/o?. 


e Ifv = 0 and u > 0, the derivative of £ with respect to y(u, v) is 


JL _ u 
dy(u,v) o2 + y(u, v) 


7 PSU PG.() — Ape, U) pa. (v). 


Therefore, 
aL 1 o? 
eee j= = — 5, 
Aa IANS 
e Ifu/o2 > v/o2 > 0, we obtain 
aL ogu — o2u 
PG, (U) PG. (V) 


y(u, v) (02 + y(u, Vuo? + y(u, vyv) 
— ÀPGa (u) pa. (v). 
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— full CSI 
bursty signaling 
no CSI 


Secure rates (bits/channel use) 


Average transmission power (dB) 
Figure 5.6 Secure communication rates over the Rayleigh fading channel for um = 1, He = 2, and 
o2 = oè = 1, and for different knowledge of the CSI. The bursty signaling strategy is based only 


m = 


on knowledge of the CSI for the main channel. 


Therefore, 


2 2 
aL ogu — ogu 


u O CRS CoE rE 


and, consequently, 


1 o oè o o2 4 o o 
y(u, v) =-=- |-| =+ ]+ + 
2 u v v u À v u 


Note that the optimal power-allocation strategy for the secrecy capacity with full CSI 
depends on the fading statistics only through the parameter à. In addition, the fact that 
y*(Sm, Ze) = 0 if gm/o2 < ge/o2 is consistent with the intuition that no power should 
be allocated when the eavesdropper obtains a better instantaneous SNR than does the 
legitimate receiver. 

Theorem 5.3 is illustrated in the case of Rayleigh fading in Figure 5.6. Even in 
this case, there is no closed-form expression for the secrecy capacity; nevertheless, 
since y*(gm, Ze) > œ as P > ov for all (gm, ge) such that gn/o2 > ge/o2, we can 
approximate C, in the limit of high power as follows. If P[Gm /o2 > Ge /o2| > 0, 
then 


: o? Gm Oe Um 
jim, CP) = Ee, /o3>6./ot ie (S G. ) es ( A r) 
which depends only on the ratio of the average SNR at the receiver [4m P/o2 and that at 
the eavesdropper He P/o2. Notice that, as P goes to infinity, C,(P) is strictly positive even 
if, on average, the eavesdropper has a better channel than does the legitimate receiver. A 
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Figure 5.7 Equivalent channel model without eavesdropper’s channel state information at the trans- 
mitter and legitimate receiver. 


closer look at the formula for the secrecy capacity with full CSI given in Theorem 5.3 
shows that the secrecy capacity is strictly positive for any transmit power and channel 
statistics, provided that P|G,,/o,, > G./o2] > 0. This result contrasts sharply with 
the Gaussian WTC, for which secure communication is impossible if the eavesdropper 
has a better channel. Hence, the fading affecting wireless channels is beneficial for 
security. Nevertheless, our result relies on the demultiplexing of the ergodic-fading 
WTC, which requires the fading coefficients of both channels to be known at the 
transmitter. The optimal power allocation derived in Proposition 5.5 requires Alice 
to allocate opportunistically more power during the coherence intervals (Am, Ae) for 
which Eve has a lower SNR than does Bob. 

For more realistic applications, it is desirable to relax the full CSI assumption and, in 
particular, to evaluate secure communication rates without knowledge of the eavesdrop- 
per’s fading coefficient Ae at the transmitter. 


Proposition 5.6. With CSI for the main channel but without CSI for the eavesdropper s 
channel, all rates R, such that 


Gm Gm Gm e 
Rs < max Ee,c, og (: + wees) — log (: ao = J|; 
o, D, 


y m e 


where y : R} — R+ is subject to the constraint E[y(Gm)] < P, are achievable full 
secrecy rates. 


Proof. The demultiplexing scheme used to prove Theorem 5.3 cannot be used directly 
because the transmitter and legitimate receiver do not know the eavesdropper’s CSI. 
Nevertheless, it is still possible to demultiplex the channel based on the main channel 
CSI, and one can include the fading coefficients affecting the eavesdropper’s channel in 
the channel statistics. As illustrated in Figure 5.7, the eavesdropper’s knowledge of her 
channel coefficients can be taken into account by treating He as a second output for the 
eavesdropper’s channel. The range of Gm is partitioned into k intervals [8m,;, &m,i+1) with 
i € [1, A]. For simplicity, we assume again that the fading gain is bounded (gm,4+1 < 00) 
and let 


Pi 2 P [Gm E€ [Sm,i> &m,i+1)] . 
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For each index i € |1, k], Alice and Bob publicly agree on a transmit alles vi and ona 
wiretap code CŻ eee to operate ona WTC with transition probabilities pÈ. ZH,|x> Such 
that the marginal Py, x corresponds to a Gaussian channel with known constant channel 
gain gm; while pea. ix corresponds to a fading eavesdropper channel with i.i.d. fading 
coefficient He treated as a second output for the eavesdropper. Since the fading coefficient 
H, is independent of the input, note that 


Yæ, z, h) Pa xE, helx) © pzm.xElhe, x) pu. (he). 


The set of transmit powers {y;}x is also chosen such that or piyi < P. We can apply 
Theorem 3.4 to this channel and, for any € > 0 and input distribution px, this shows the 
existence of a wiretap code C} of length n with rate 


Ri > I(X;Y) — I(X; ZH.) — €, 


such that (1/n)L(C}) < 5(€) and P.(CŻ) < 5(€). In particular, for the specific choice 
X ~ CN(0, yi), we obtain 


I(X;Y) = log (1 Es sani) 
O, 


m 


and 


Oo, 


e 


where we have used I(X; He) = 0 since He is independent of the channel input X by 
assumption. Therefore, 


m,i Yi Ge i 
R > log (1+ =% ) — £c, tog (14 z) -e 
Om Og 


Because the fading coefficient Hm is known by the transmitter and receivers, this ergodic- 
fading WTC can be demultiplexed into k WTCs with time-invariant Gaussian main 
channel and fading eavesdropper’s channel as in Figure 5.7. The set of codes {C'}, for 
the demultiplexed channels can be viewed as a single code C,, for the ergodic-fading 
channel, whose rate is 


k 
R,=5— piR 
ist 


= -Yp (ios (1+ fait) _ EG, og (1 a sw) —e, 


and subject to the constraint 


k 
5 Pivi S P 
i=1 
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In addition, 


1 an oer 
LC) = > Pi LC) < 46) 


i=l 


and 


i 
Pa(Cn) = XL p:P(C}) < d€€). 


i=1 


Note that k can be chosen arbitrarily large and € can be chosen arbitrarily small. Hence, 
the ergodicity of the channel guarantees that all rates Rs such that 


Gm Gm Ge Gm 
Rs < Ee,.c, og (: T Sen) — log (1 + Srm) (5.23) 
D, D, 


e e 


with E[y (Gm, Ge)] < P are achievable full secrecy rates. Finally, we can improve the 
upper bound in (5.23) by optimizing over all power allocations y satisfying the constraint 
SGnl¥(Gm)] < P. 


Although the achievable rates in Theorem 5.3 and Proposition 5.6 differ only in the 
arguments of the power-allocation function y, this similarity is misleading because the 
underlying codes are fundamentally different. In Theorem 5.3, all parties have access 
to full CSI about the channels, and the code is composed of independent wiretap codes 
for Gaussian WTCs that are multiplexed to adapt to the time-varying fading gains. 
However, in Proposition 5.6, the code is composed of independent wiretap codes that are 
interleaved to adapt to the main channel fading gain only and whose codewords spread 
over many different realizations of the eavesdropper’s channel gain. 

The optimal power allocation y : Ry — R+ for Proposition 5.6 cannot be derived 
exactly because the objective function 


Ge 
five log(1+ 27) - shoe (1+ 7 )| 
on OE 


is not concave in y. A Lagrangian maximization as in Proposition 5.5 would allow us 
to compute achievable full secrecy rates, but, in general, y does not admit a closed- 
form expression. Instead, we consider a simple bursty signaling’ strategy, in which the 
transmitter selects a threshold t > 0 and allocates power as 


_ JP £ P/P[Gm>t] if u>t, 
= ‘a otherwise. 


For Rayleigh fading, we can compute the bound in Theorem 5.6 in closed form in terms 
of the exponential-integral function 


oO p-y 
Brix f — dy. 
x 7 


4 Bursty signaling is also called “on-off” power control. 


5.2.2 
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Evaluating (5.23) explicitly for bursty signaling over a wireless channel with i.i.d. 
Rayleigh fading shows that all secure rates R, such that 


T tP, on a T 
R, < exp | —— J log | 1 + —> | + exp E; + — } log(e) 
Um On Mm Pr MmPr Hm 


G i)e (ee) mw 
ex 1 og(e). 
p UePt Hm HeP: j 


are achievable. As illustrated in Figure 5.6, the lack of knowledge about the eavesdrop- 
per’s channel has a detrimental effect on secure communication rates. 

If assumptions are further relaxed, and no CSI is available at the transmitter, then 
power cannot be allocated to avoid harmful situations in which the eavesdropper has a 
higher SNR than that of the legitimate receiver. In particular, if E[G.]/o2 > E[Gm]/o2 
and the eavesdropper obtains a better average SNR than does the legitimate receiver, the 
secrecy capacity without any CSI is zero. 


Block-fading channels 


It is important to realize that the conclusions drawn regarding the effect of CSI depend 
on the fading statistics considered; different fading models lead to slightly different 
conclusions. In this section, we consider a block-fading model, for which the coher- 
ence interval is sufficiently long that coding can also be performed within the interval. 
Specifically, the processes {Hm,; S and {Hes}; are i.i.d., but for each realization 
(Ami, Ae,i) the relationships between channel inputs and outputs are 


r, = m,i Xij + Nmij, for je[l, NJ, 


Zij = heiXi,j + Neij, 


where N is assumed to be sufficiently large for asymptotic coding results to hold. 
If the transmitter and receivers have CSI about all channels, then the demultiplexing 
and power-allocation scheme used for the ergodic-fading WTC can be used, and the 
secrecy capacity is given again by Theorem 5.3 with the optimal power allocation of 
Proposition 5.5. The situation is quite different without knowledge about the eaves- 
dropper’s fading at the transmitter. In fact, for the ergodic-fading model considered 
in Section 5.2.1, the transmitter is allowed a single channel use per coherence interval; 
consequently, the information leaked to the eavesdropper can be arbitrarily large. In con- 
trast, for the block-fading model, the transmitter can code within each coherence interval 
and the information leaked to the eavesdropper cannot exceed the information commu- 
nicated to the legitimate receiver. Specifically, we let XY represent a coded sequence 
chosen at random in the transmitter’s codebook and sent during one coherence inter- 
val, and we let Z™ denote the corresponding eavesdropper’s observation. Then, it holds 
that 


I(X¥; Z“) < H(X“) <00, 


because X takes a finite number of values. 
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Theorem 5.4 (Gopala et al.). The secrecy capacity of a block-fading WTC with CST 
about the main channel but no CSI about the eavesdropper s channel is 


Gm)Gm m)Ge\\" 
C, = max UGG. Ic (1 + wee | — log (1 + wee) ) | , 
y Om Fe 


subject to the constraint E[y (Gm)] < P. 


Proof. We provide only the achievability part of the proof, and refer the reader to [86] 
for details regarding the converse. The key ideas behind the code construction are to 
code within each coherence interval in order to bound the information leaked to the 
eavesdropper and to spread the codewords over many realizations of the eavesdropper’s 
fading gain. The proof is greatly simplified by noting that the block-fading channel can 
be treated as an ergodic-fading channel with a vector input X™ and vector outputs YN 
and Z™ such that 


YN = HX" +N and Z” = HX” +NN. 


Therefore, we can use the same approach as in the proof of Proposition 5.6. 
The range of Gm is partitioned into k intervals [2m.i, Zmi+1) With i € [1, k]. We 
assume the fading gain to be bounded (2m,x+1 < 00), and we let 


Pi =P [Gm € [Emi Smi+1)]: 


For each index i € |1, k], Alice and Bob publicly agree on a transmit power y; and 
on a wiretap code CÌ of length n designed to operate on a vector WTC with transition 
probabilities pe zynąjxy: Note that the marginal Puix y is such that 


N 
N N © N N 
VOY, x) Phapa OA) = | | pyixilaa), 
i=l 
where Pos corresponds to a Gaussian channel with known constant fading coefficient 
Ay. Similarly, the marginal Poe xv is such that 


N 
VG" hex”) Pouuae helx™) = (11 pziH.x(zi, a) Pu. (he), 


i=1 


where pz)H.x corresponds to a fading eavesdropper channel with fading coefficient 
He available to the eavesdropper. The set of transmit powers is also chosen such that 
Ya piyi < P. By Theorem 3.4, for any € > 0 and input distribution px», there exists 
a wiretap code C! of length n with rate 


Ri > I(X"; Y") S108 ZYH.) — €, (5.24) 


measured in bits per vector channel use and such that (1/n)L(C}) < 5(€) and P.(C!) < 
(€). We are free to optimize the distribution of X™ as long as the power constraint 


we [X4] < is satisfied; in particular, we can choose X” to represent the codewords 
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chosen uniformly at random in a codebook such that 


log (1 + z) — e < H(X“) < log (1 + fait) 
O, o2 


and whose probability of error over a Gaussian channel with gain gm; is at most d(€). 
The existence of this code is ensured directly by the channel coding theorem if N is 
large enough. Since X™ represents a codeword in a codebook, it follows from Fano’s 
inequality that 

H(X*|Y") < N8(e). 
Therefore, 


I(X;Y") = H(X”) — H(X*|Y") >N log (: $ san) — Nô(e) (5.25) 
O, 


m 


Note that I(x” ; He) = 0 because the fading coefficient He is independent of the input 
and that the channel is memoryless; therefore, 


1(X"; ZYH) = 1(X%; Z He) 
= En, [1(X"s Z™IH.)] 


< NE c. {log (1+ "es )} (5.26) 


Finally, note that the following trivial upper bound holds: 


Whe I(X%;Z"|H, = he) < H(X”) < N log (: + fat) (5.27) 


m 


On combining (5.25), (5.26), and (5.27) in (5.24) we obtain 


miri i Ge = 
R > wc, | (oe (14 £874) — ios ( = )) |- xo 
On e 


Since the fading coefficient Hm is known by the transmitter and receivers, the channel can 
be demultiplexed into k vector input WTCs. The set of codes {C} }x for the demultiplexed 
vector channels can be viewed as a single code C, for the block-ergodic fading channel, 
whose rate in bits per channel use is 


k + 
m,i yi i Ge 
> Spi | (a (1+ Sst 2) — log (1+ ree) ) | — 6(€), 
i=1 m e 


and subject to the constraint $% p;y; < P. In addition, 


k 
1 1 1 
ae L(C,) = — ree Weare 
antler) = y DP, MCn) < aC) 
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and 
E s 
P(Cn) = XC piP(Ci) < 8(€). 
i=l 


Note that k can be chosen arbitrarily large and € can be chosen arbitrarily small. Hence, 
the ergodicity of the channel guarantees that all rates Rs, such that 


Gm Gm Ge m j 
Rs < EGnGe Ic (: + Sure) ) — log (: + Stm )) | 
Om Oe 


with Eg, [v(Gm] < P, are achievable full secrecy rates (in bits per channel use). 


Theorem 5.4 differs from Corollary 5.6 only by the presence of the operator (-)* in the 
expectation, which appears because the information leaked to the eavesdropper within 
each coherence interval is bounded. The formula for the secrecy capacity highlights 
again that fading is beneficial for security and, in contrast to ergodic fading, the lack 
of knowledge about the eavesdropper’s CSI at the transmitter seems to incur a lesser 
penalty for block-ergodic fading. For Rayleigh fading, the upper bound in Theorem 5.4 
can be computed in closed form for the bursty signaling strategy defined in Section 5.2.1. 
Bursty signaling over wireless channels with i.1.d. Rayleigh fading can achieve all rates 
R, such that 


T tP, ae T on 
R, < exp | ——— ] log | 1 + —~— }] t+ exp | —— ] E: | — + log(e) 
Um oi bemP; Um HmP: 
. A a a 
ex) — — o 
: ele Hm | (Meo? de Py l N eP; on 
ex) — t+ -> ; : 
PNE e) \ ites Heg A Beas ) 


Interestingly, if P goes to infinity and t goes to zero, (5.28) becomes 


2 
R; < log (+2), 


m e 


Note that the right-hand side is the secrecy capacity with full CSI as P goes to infinity. 
Therefore, for the block-fading model, the bursty signaling strategy approaches the 
secrecy capacity in the limit of large power. This is illustrated in Figure 5.8, which 
shows the secrecy capacity of a block-fading model with perfect knowledge of all fading 
coefficients and the secure rate achievable with the bursty signaling strategy for different 
values of the power P. 


Quasi-static fading channels 


In this last section, we consider the situation in which the fading coefficients {Hj}, ;}i>1 
and {H{;}i>1 remain constant over the transmission of an entire codeword and 
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— full CSI 
= = bursty signaling 
== no CSI 


Secure rates (bits/channel use) 


Average transmission power (dB) 


Figure 5.8 Secure communication rates over the Rayleigh block-fading channel with parameters 
Um = 1, Me = 2, and oå =o. =], 


change independently at random from one codeword to another. This contrasts 
with the ergodic-fading and block-fading models, in which every transmitted code- 
word experiences many fading realizations during transmission. This model is often 
called a quasi-static fading model, and, for each coherence interval characterized 
by fading realizations (m, Ae), the model reduces to a Gaussian WTC defined 
by 


Yi = hmX;i + Nini and Zi = h.X; + Nei- 


The input is subject to a power constraint (1/n))~"_, E [X?] < P, which is interpreted 
as a short-term constraint and must be satisfied within each coherence interval. Again, 
this contrasts with the long-term power constraint we used for the ergodic-fading and 
block-fading models, and the short-term constraint of the quasi-static model prevents 
the transmitter from allocating power opportunistically depending on the fading gain; 
nevertheless, the transmitter can still adapt its coding rate to the realization of the fading 
coefficients. While we have seen in previous sections that the possibility of secure 
communications with ergodic fading is determined by the average fading realization, 
we will see that secure communications over quasi-static channels are determined by 
the instantaneous one. 

If the transmitter, the legitimate receiver, and the eavesdropper have perfect knowledge 
of the instantaneous realizations of the fading coefficients (Am, Ae), the wiretap code used 
for each realization of the fading can be chosen opportunistically. The aggregate secure 
communications rate achievable over a long period of time is then given by the following 
theorem. 
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Theorem 5.5 (Barros and Rodrigues). With full CSI, the average secrecy capacity of a 
quasi-static wiretap channel is 


C3* = Eg,,c, [Ce"(Gm, Ge)), 


where C¥*(gm, Ze) is the instantaneous secrecy capacity, defined as 


mP yy 
CP(gm, Be) & (log (1+ 22) — tog (1+ 2) . 
Om Oe 


In the case of iid. Rayleigh fading, C9 can be computed explicitly using the 
exponential-integral function as 


Ce = m E n log(e) 
= ex (0) 
s p mP 1 P gle 


E E OE Vin 
ex og(e), 
p MmP = UeP i MmP = UeP . 


and one can check that 


2 
lim C™(P) = log (1 + 2 be) 
Poo On He 

If the transmitter knows the fading coefficient Am of the main channel but does not 
know the fading coefficient Ae of the eavesdropper’s channel, then the average secrecy 
capacity for a quasi-static fading model is zero, no matter what the statistics of the 
channels are. In fact, since a codeword experiences a single realization of the fading 
gain, the probability of the eavesdropper obtaining a better instantaneous SNR for 
an entire codeword is always strictly positive, and no coding can guarantee secrecy. 
Nevertheless, one can still obtain insight into the security of wireless communications 
by taking a probabilistic view of security. 

If we assume that the transmitter knows Am, then the rate of the code used within each 
coherence interval can be adapted to guarantee reliability; however, without knowledge 
of he, the transmitter can use only a wiretap code targeted for a predefined secure 
communication rate R. Whenever the realization ge is such that R < C!"(gm, Ze), 
it follows from Remark 5.2 that the message is transmitted securely; however, if 
R > Ct"(gm, Ze), then some information is leaked to the eavesdropper. This behav- 
ior can be characterized by using the notion of the outage probability of the secrecy 


capacity. 


Definition 5.1. Jf the transmitter knows the fading coefficient of the main channel, the 
outage probability of the secrecy capacity is defined as 


Pa (R) = Pe,.G,[Ce"(Gm, Ge) < R]. 


For Rayleigh fading, P.,,(R) takes the closed-form expression 


Py(R) = 1 sa le 
t = X m . 
n Um + 2È tiga? Jor 3 UmP 
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Figure 5.9 Outage probability of the secrecy capacity for various values of ym (R = 0.1, we = 1, 
o? =02= 1). 
Figure 5.9 illustrates the behavior of P,,,(R) as a function of the transmit power P for 
various values of the average fading gain um. The channel statistics must give Alice and 
Bob a clear advantage over Eve in order to achieve low values of the outage probability. 
The relevance of the outage approach is also very much application-dependent, insofar 
as even leaking information with a low probability might sometimes be unacceptable. 

Note that P.,,,(R) is a decreasing function of P and cannot be reduced by decreasing the 
transmission power. However, reducing the targeted secure transmission rate R reduces 
the outage probability, and the minimum outage probability is obtained as R goes to 
zero, that is 

Me 
“E E 

which is always strictly positive. As expected, no matter what the transmitter does, 
information is leaked to the eavesdropper. Despite this seemingly disappointing result, 
it is worth noting that the outage probability is a pessimistic metric, which does not 
discriminate between events for which R >> C, and events for which R exceeds C, by a 
small amount. In addition, if the fading realizations are independent, outage events are 
independent of each other as well; as a result, a security leakage at a given time instant 
does not necessarily hinder security at later times. 


Remark 5.7. Ifthe transmitter does not have any CSI, both reliability and security 
need to be assessed in terms of outage. The definition of the outage probability can be 
modified to 


Pw = Pon, [C° (Gm, Ge) < R, Ca (Gm) < R], 


where C1 (gm) = log (1 + gmP/ on) is the instantaneous main channel capacity. 
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Conclusions and lessons learned 


The secrecy capacity of a Gaussian WTC admits a simple characterization as the dif- 
ference between the main channel capacity and the eavesdropper’s channel capacity. 
Consequently, secure communication over a Gaussian WTC is possible if and only if 
the legitimate receiver obtains a higher SNR than does the eavesdropper. This result is 
somewhat disappointing because it seems to limit the scope of applications. However, 
as shown by the analysis of Gaussian source models, this limitation can be overcome 
by considering more powerful communication schemes exploiting feedback, such as 
secret-key agreement schemes. 

Our analysis of the MIMO Gaussian WTC leads to a severe conclusion: little can 
be done against the collusion of many eavesdroppers. Nevertheless, the MIMO model 
may be overly pessimistic because it ignores the communication requirements of the 
eavesdroppers. Most often, the bandwidth of eavesdroppers will be limited, which is 
likely to mitigate the detrimental impact ofa collusion. On a more positive note, our study 
shows that coding for secrecy is, in general, more powerful than beamforming alone. 

Perhaps surprisingly, the fluctuations of received SNR induced by fading in wireless 
transmissions are beneficial for security. If the instantaneous fading realizations can be 
accurately estimated by the transmitter, transmit power can be allocated opportunistically 
to the fading realizations for which the eavesdropper obtains a lower instantaneous SNR 
than that of the legitimate receiver. As a result, strictly positive secure communication 
rates are achievable even if, on average, the eavesdropper obtains a better SNR than that 
of the legitimate receiver. For some fading models, this is possible even if the transmitter 
does not have access to the eavesdropper’s instantaneous fading realization. 
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Coding and system aspects 


Coding for secrecy 


In this chapter, we discuss the construction of practical codes for secrecy. The design 
of codes for the wiretap channel turns out to be surprisingly difficult, and this area of 
information-theoretic security is still largely in its infancy. To some extent, the major 
obstacles in the road to secrecy capacity are similar to those that lay in the path to 
channel capacity: the random-coding arguments used to establish the secrecy capac- 
ity do not provide explicit code constructions. However, the design of wiretap codes 
is further impaired by the absence of a simple metric, such as a bit error rate, which 
could be evaluated numerically. Unlike codes designed for reliable communication, 
whose performance is eventually assessed by plotting a bit-error-rate curve, we cannot 
simulate an eavesdropper with unlimited computational power; hence, wiretap codes 
must possess enough structure to be provably secure. For certain channels, such as 
binary erasure wiretap channels, the information-theoretic secrecy constraint can be 
recast in terms of an algebraic property for a code-generator matrix. Most of the chap- 
ter focuses on such cases since this algebraic view of secrecy simplifies the analysis 
considerably. 

As seen in Chapter 4, the design of secret-key distillation strategies is a somewhat 
easier problem insofar as reliability and security can be handled separately by means 
of information reconciliation and privacy amplification. Essentially, the construction of 
coding schemes for key agreement reduces to the design of Slepian—Wolf-like codes for 
information reconciliation, which can be done efficiently with low-density parity-check 
(LDPC) codes or turbo-codes. 

We start this chapter by clarifying the connection between secrecy and capacity- 
achieving codes (Section 6.1), which was used implicitly in Chapter 3 and Chapter 4, 
to highlight the insight that can be gained from the information-theoretic proofs. We 
then briefly recall some fundamental properties of linear codes and LDPC codes (Sec- 
tion 6.2), and we use these codes as building blocks for the construction of wiretap 
codes over the binary erasure wiretap channel (Section 6.3) and efficient Slepian—Wolf 
codes for information reconciliation in secret-key distillation strategies (Section 6.4 and 
Section 6.5). We conclude with a discussion of secure communication over wiretap 
channels using secret-key distillation strategies (Section 6.6). 
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6.1 


Coding for secrecy 


Secrecy and capacity-achieving codes 


A natural approach by which to construct practical wiretap codes is to mimic the code 
structure used in the achievability proofs of Theorem 3.2 and Theorem 3.3. Specifically, 
we established the existence of codes for a WTC (£, pyzx, Y, Z) by partitioning a 
codebook with [2”*][2”*«] codewords into [2”*] bins of [2”%«] codewords each. The 
f2”®][2”%«] codewords were chosen so that the legitimate receiver could decode reliably. 
In addition, the bins were constructed so that an eavesdropper knowing which bin is used 
could decode reliably, as well. 

Each bin of codewords can be thought of as a subcode of a “mother code,” which 
is known in coding theory as a nested code structure. More importantly, a closer look 
at the proofs shows that these subcodes are implicitly capacity-achieving codes for the 
eavesdropper’s channel, since the rate Ra of each subcode is chosen in (3.24) such that 


Ra = I(X; Z) — 6(€) for some small € > 0. 


This condition is somewhat buried in the technical details and it is worth clarifying the 
connection between secrecy and capacity-achieving codes with a more direct proof. 

Consider a WTC (4, pyz;x, Y, Z), and let C be a code of length n with f2”*) disjoint 
subcodes {C;};2»2, such that 


park y 


Cac, 
i=l 


For simplicity, we assume that C guarantees reliable communication over the main 
channel and analyze only its secrecy properties. Following the stochastic encoding 
suggested by the proof in Section 3.4.1, a message m € |1, 2”*] is sent by transmitting 
a codeword chosen uniformly at random in the subcode Cm. The following theorem 
provides a sufficient condition for this coding scheme to guarantee secrecy with respect 
to the eavesdropper. 


Theorem 6.1 (Thangaraj ef al.). If each subcode in the set {C;};2x1 stems from a 
sequence of capacity-achieving codes over the eavesdropper 5 channel as n goes to 
infinity, then 
1 
lim —1(M; Z”) = 0. 
n>œ n 


Proof. Let Ce denote the capacity of the eavesdropper’s channel. If each subcode in 
{Ci} r2»r] stems from a sequence of capacity-achieving codes for the eavesdropper’s 
channel then, for any € > 0, there exists n large enough that 


1 
vi e [1, 2°] -I(X";Z"|IM=i) > Ce — €. 
n 


Consequently, (1/n)I(X"; Z”|M) > Ce — € as well. We now expand the mutual infor- 
mation I(M; Z”) as 
I(M; Z") = I(Z"; X"M) — 1(X"; Z"|M) 
= W(X"; Z") + 1(M; Z" |X") — 1(X"; Z"|M). 


6.2 


6.2.1 
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Note that I(M; Z”|X”) = 0 since M > X” — Z” forms a Markov chain. In addition, 
I(X"; Z”) < nC, since the eavesdropper’s channel is memoryless. Therefore, 


1 1 1 
-I(M; Z") = —I0X"; Z") — —1(X"; Z"|M) 
n n n 


< Ce — (Ce — €) 


= e€. 


Theorem 6.1 naturally suggests a code-design methodology based on nested codes 
and capacity-achieving codes over the eavesdropper’s channel. Unfortunately, practi- 
cal families of capacity-achieving codes are known for only a few channels, such as 
LDPC codes for binary erasure channels and polar codes for binary input symmetric 
channels; even for these channels, constructing a nested code with capacity-achieving 
subcodes remains a challenging task. Despite this pessimistic observation, note that 
the use of capacity-achieving codes for the eavesdropper’s channel is merely a suffi- 
cient condition for secrecy, which leaves open the possibility that alternative approaches 
might turn out to be more successful. For instance, the code constructions for binary 
erasure wiretap channels presented in Section 6.3 are based on a somewhat different 
methodology. 


Remark 6.1. The connection between secrecy and capacity-achieving codes also holds 
for secret-key distillation strategies. In fact, for the secret-key distillation strategies 
based on Slepian—Wolf codes analyzed in Section 4.2.2, the number of bins in (4.16) was 
chosen arbitrarily close to the fundamental limit of source coding with side infor- 
mation. Nevertheless, the alternative approach based on sequential key-distillation 
circumvents this issue, and provides a design methodology that does not depend 
on Slepian—Wolf codes achieving the fundamental limits of source coding with side 
information. 


Low-density parity-check codes 


Low-density parity-check (LDPC) codes constitute a family of graph-based block codes, 
whose performance approaches the fundamental limits of channel coding or source 
coding when the block length is large, and which can also be decoded efficiently 
with an iterative algorithm. Since we use LDPC codes extensively in the remainder 
of this chapter, we devote this section to a brief review (without proofs) of binary 
LDPC codes and their properties. We refer the interested reader to the textbook by 
Richardson and Urbanke [108] for a comprehensive and in-depth exposition on the 
subject. 


Binary linear block codes and LDPC codes 


Before discussing LDPC codes, we review some basics of binary linear block codes; 
in particular, the notions of dual code and coset code will be useful for secrecy 
codes. 
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Definition 6.1. A binary (n,n — k) block code is a set C © GF(2)" of cardinality! 

|C| = 2". The elements of C are called codewords. Associated with the code is a 
bijective mapping between GF(2)"~* and C, which is called an encoder. The elements of 
GF(2)"~* are called messages. An (n, n — k) code is linear if C is an (n — k)-dimensional 
subspace of GF(2)". The rate of a code C is defined as R = (n — k)/n. 


A linear code C is represented concisely by a matrix G € GF(2)’~**", called the 
generator matrix, whose rows form a basis of C. An encoder can then be described by 
the matrix operation 


mtr G'm. 


A generator matrix G specifies a code completely, but notice that G is not unique (any 
basis of C can be used to construct G). Different generator matrices define different 
encoders. 


Definition 6.2. The dual of an (n,n — k) linear code C is the set C+ defined as 


cts fe caror :YxecC X cix =o}. 


i=l 
In other words, C+ contains all vectors of GF(2)" that are orthogonal to C. 


The reader can check that C+ is actually an (n, k) linear code. A generator matrix of 
C+ is denoted by a matrix H € GF(2)**” and is called the parity-check matrix of C. Note 
that H satisfies GHT = 0 and that all codewords x € C must satisfy the parity-check 
equations Hx = 0. 


Definition 6.3. For a linear (n,n — k) code C with parity-check matrix H and for 
sE GF(2)*, the set 


C(s) = {x € GF(2)” : Hx = s} 
is called the coset code of C with syndrome s € GF(2)*. In particular, C = C(0). 


A coset code is also described by a translation of the original code. In fact, if x’ € 
GF(2)” is such that Hx’ = s, then 


C(s) = {x ®x: x EC}. 


A sequence x’ € C(s) with minimum weight is called a coset leader of C(s). It is possible 
to show that an (n, n — k) code has 2* disjoint cosets, which form a partition of GF(2)". 

Binary LDPC codes are a special class of binary linear codes, characterized by a sparse 
parity-check matrix H, which contains a much smaller number of ones than zeros. In 
other words, the parity-check equations defining the code involve only a small number 
of bits. Rather than specifying the LDPC code in terms of its parity-check matrix, it is 
convenient to use a graphical representation of H called the Tanner graph. The Tanner 


1 The usual convention is to consider (n, k) block codes so that the number of codewords is 2* rather than 
gn-k. nevertheless, this alternative convention simplifies our notation later on. 
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Figure 6.1 A parity-check matrix and its corresponding Tanner graph for a code with blocklength 
n= 10. 


graph of H € GF(2)**” is a bipartite graph with n variable nodes and k check nodes 
connected by edges. Each variable node represents a bit x; in a codeword, while each 
check node represents a parity-check equation satisfied by the codewords. Specifically, 
letting H = (h;i) ,„ the jth check node represents the equation 


n 
Cj = BD xihji. 
i=l 


An edge connects variable node x; to check node c; if and only if x; is involved in the 
jth parity-check equation, that is h ;; = 1. The degree of a node is defined as the number 
of edges incident to it. As an example, Figure 6.1 illustrates a parity-check matrix and 
its corresponding Tanner graph for a binary linear code of length = 10, in which all 
variable nodes have degree 3 and all check nodes have degree 6. 

Given a Tanner graph, it is possible to compute its variable-node edge-degree dis- 
tribution {A;};>1, in which A; is the fraction of edges incident on a variable node with 
degree i. Similarly, the check-node edge-degree distribution is {p;};>1, in which pj 
is the fraction of edges incident on a check node with degree j. These edge-degree 
distributions (degree distributions for short) are often represented in compact form by 
the following polynomials in x: 


A(x) = 5 Aixi! and p(x) = 5 pixi. 
i>1 jèl 
The rate of the code is directly related to the edge-degree distribution by 
a Jo eax 
i. A(x)dx 
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Note that a parity-check matrix H specifies a unique Tanner graph and thus a unique 
degree distribution, whereas a degree distribution specifies only an ensemble of codes 
with the same rate R (for instance, all permutations of nodes in a graph have the 
same degree distribution). Fortunately, for large block lengths, all codes within a given 
ensemble have roughly the same decoding performance; hence, LDPC codes are often 
specified by their degree distributions (A(x), e(x)) alone. 

An LDPC code is called regular if all variable nodes have the same degree, and all 
check nodes have the same degree. Otherwise, it is called irregular. 


Example 6.1. A rate-5 regular (3, 6) LDPC code is such that all variable nodes have 
degree 3, and all check nodes have degree 6. Its degree distributions are simply 


Mx) =x? and p(x) =x°. 


Example 6.2. The following irregular degree distributions correspond to another rate-+ 
LDPC code: 


A(x) = 0.106 257x + 0.486 659x? + 0.010 390x!? + 0.396 694x 1, 
p(x) = 0.5x7 + 0.5x°. 


Message-passing decoding algorithm 


Let C be an (n,n —k) LDPC code with parity-check matrix H € GF(2)'*". Con- 
sider a codeword x = (x1, ...,Xn)™ € C whose bits are transmitted over a binary-input 
memoryless channel ({0, 1}, py;x(v|x), Y). Let y = (1, ..-, Yn)" denote the vector of 
received symbols. The success of LDPC codes stems from the existence of a com- 
putationally efficient algorithm to approximate the a-posteriori log-likelihood ratios 
(LLRs) 


P[X; =0 
Ai = log (a) fori € [1,7]. 


The sign of 4; provides the most likely value of the bit x;, while the magnitude |);| 
provides a measure of the reliability associated with the decision. For instance, if|A;| = 0, 
the bit x; is equally likely to be zero or one; in contrast, if |A;| = oo, there is no uncertainty 
regarding the value of x;. 

For i € [1, n], we let (i) denote the indices of check nodes connected to the variable 
node x; in the Tanner graph; the set can be obtained from the parity-check matrix 
H = (A ji)kn as 
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Table 6.1 Belief-propagation algorithm 


Initialization. 
> For each i € [1, n] and for each j € N(i) 


(0) _ 0) _ 
uj, =U; = 09. 


> For eachi € [1,7] 
i l0 
ADT = log (eae) 
PyixOill) 


Iterations. For each iteration / € [1, /max]] 
> For each i € [1, n] and for each j € N(i) 


O _ INT > : (-1) 
Vij = A; + U mi . 
meN(i)\j 


> For each j € [1, A] and for each i € M(/) 


D 
y 
(O -1 mj 
u;; = 2tanh ( l | tanh (#)] é 
mEM(j)\i 


Extrinsic information. For alli € [1, 7] 


EXT __ (Imax) 
X S= J Un e 


meN(i) 


Hard decisions. For alli € [1, 7] 


8, = E(1— siga (aP? +a"), 


Similarly, for j € [1, k], we let M(j) denote the indices of variable nodes connected to 
the check node c; in the Tanner graph; that is, 


The LLRs {A;}, can be approximated using the iterative algorithm described in Table 6.1. 
This algorithm is called the “belief-propagation” algorithm or the “sum—product” 
algorithm. 

As illustrated in Figure 6.2, the belief-propagation algorithm belongs to the class of 
“message-passing” algorithms, since the quantities ve and u9), which are updated at 
each iteration /, can be understood as “messages” exchanged between the variable nodes 
and check nodes along the edges of the Tanner graph. The final hard decision for each 
bit x; is based on two terms: ANE which is called the intrinsic information (or intrinsic 
LLR) because it depends only on the current observation y;; and AF*T, which is called 
the extrinsic information (or extrinsic LLR) because it contains the information about 
x; provided by other observations. The usefulness of the algorithm is justified by the 


following result. 
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Xi 


Figure 6.2 Illustration of message-passing behavior for the belief-propagation algorithm. 


Theorem 6.2. If the Tanner graph of an LDPC code does not contain cycles, then the 
values ee + AEXT J, computed by the message-passing algorithm converge to the true 
a-posteriori LLRs {i;},. The hard decisions are then equivalent to bit-wise maximum 
a-posteriori (MAP) estimations. 


In practice, even if the Tanner graph contains a few cycles, the message-passing 
algorithm performs reasonably well. The complexity of the algorithm is linear in the 
number of edges, and is particularly useful for codes with sparse Tanner graphs, such as 
LDPC codes. 


Properties of LDPC codes under message-passing decoding 


Definition 6.4. Consider a set of binary-input memoryless channels that are all char- 
acterized by a parameter a. The set of channels is ordered if a < œ) implies that the 
channel with parameter œ is stochastically degraded with respect to the channel with 
parameter &ı. 


Definition 6.5. A binary-input memoryless channel ({+]}, Py|x(|x), y) is output- 
symmetric if the transition probabilities are such that 


vyey pyx(vl — 2) = pyx(—y/1). 


Examples of ordered families of binary-input symmetric-output channels include 
binary symmetric channels with cross-over probability p and binary erasure channels 
with erasure probability e (after relabeling of the inputs), and binary-input additive white 
Gaussian noise channels with noise variance o° under the same input power constraint. 


Theorem 6.3 (Richardson and Urbanke). Consider an LDPC code of length n chosen 
uniformly at random from the ensemble of codes with degree distributions (A(x), e(x)) 
and used over an ordered family of binary-input symmetric-output channels with param- 
eter a. Then, the following results hold. 


6.3 
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1. Convergence to the cycle-free case: as n goes to infinity, the Tanner graph becomes 
cycle-free. 

2. Concentration around the average: as n goes to infinity, the error-decoding capa- 
bility of the code under message-passing decoding converges to the average error- 
decoding capability of the code ensemble. 

3. Threshold behavior: there exists a channel parameter a”, called the threshold, such 
that the bit error probability goes to zero as n goes to infinity if and only ifa < a’. 


Theorem 6.3 simplifies the design of LDPC codes tremendously because it states that 
it is sufficient to analyze the average performance of an ensemble of LDPC code with 
given degree distributions (A(x), o(x)) rather than focus on an individual code. For large 
length n, the probability of error with belief-propagation decoding of most codes in the 
ensemble is close to the probability of error averaged over the ensemble; hence, to con- 
struct good codes with rate R, it suffices to optimize the degree distributions (A(x), e(x)) 
so as to maximize the threshold œ*. This optimization is, in general, non-convex in the 
degree distribution, but it can be numerically solved by combining an efficient algo- 
rithm to compute the threshold of given degree distributions (A(x), e(x)), called density 
evolution, and a heuristic genetic optimization algorithm, called differential evolution. 


Example 6.3. Consider the family of binary erasure channels with parameter a, the 
erasure probability. The threshold of a regular (3,6) LDPC code is a* © 0.42. The 
threshold of a rate-4 irregular code with distribution as in Example 6.2 is a* ~% 0.4741. 


Note that Theorem 6.3 characterizes a threshold for the asymptotic behavior of the 
bit error probability. This result can be refined and, as stated in the following theorem, 
one can show that the same threshold also characterizes the behavior of the block error 
probability for some LDPC ensembles. 


Theorem 6.4 (Jin and Richardson). Jf the degree distributions (A(x), p(x)) ofan LDPC 
ensemble do not contain any variable nodes of degree 2, then the bit error probability 
threshold is also the block error probability threshold. 


LDPC ensembles with high thresholds usually have a high fraction of nodes of degree 
2, and it may seem that Theorem 6.4 prevents us from obtaining high thresholds for 
the block error probability. However, it is possible to strengthen Theorem 6.4 to acco- 
modate some fraction of degree-2 nodes. The presentation of this result goes beyond the 
scope of this chapter and we refer the interested reader to [109] for more details. 


Secrecy codes for the binary erasure wiretap channel 
In this section, we restrict our attention to the binary erasure wiretap channel illustrated 


in Figure 6.3, in which the main channel is noiseless while the eavesdropper’s channel 
is a BEC with erasure probability €. From Corollary 3.1, the secrecy capacity of this 
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Figure 6.3 Binary erasure wiretap channel. 


channel is C; = 1 — (1 — €) = e. Following the observations made in Section 6.1, we 
investigate a code construction based on nested codes. However, since any codeword sent 
by the transmitter is received without errors by the legitimate receiver, the construction 
is much simpler than in the general case: for any set of disjoint subcodes {C;}, the mother 
code C = |; C; is always a reliable code for the (noiseless) main channel. 

A set of subcodes that leads to a particularly simple stochastic encoder consists of an 
(n,n — k) binary linear code Co and its cosets. For this choice of subcodes, the mother 
code C is 


C= |] C8) = GFO. 


seGF(2)* 


The corresponding stochastic encoding procedure is called coset coding, and it consists 
of encoding a message m € GF(2)* by selecting a codeword uniformly at random in 
the coset code Co(m) of Co with syndrome m. The following proposition shows that the 
encoding and decoding operations in coset coding can be implemented efficiently with 
matrix multiplications. 


Proposition 6.1. LetCo bean (n, n — k) binary linear code. Then there exists a generator 
matrix G € GF(2)"~*" and a parity-check matrix H € GF(2)'*" for Co and a matrix 
G’ € GF(2)**" such that 


e the encoder maps a message m to a codeword as 
m 
m+> (GT an( ), 
v 
where the vector v € GF(2)"~* is chosen uniformly at random; 
e the decoder maps a codeword x to a message as 
xh Hx. 


Proof. Let {g;}n—- be a basis of Co and let G be a generator matrix whose rows are the 
vectors {g] }n-x. The set {g;}n—-x, which is linearly independent, can be completed by a 
linearly independent set {h;}, to obtain a basis of GF(2)”. Let G’ be a matrix whose rows 
are the vectors {hj },. 
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Let us now consider the encoding of a message m € GF(2)* as 


n—k 


k 
m (GT en (™) = So mh +5 vig, (6.1) 
i=1 i=1 


where v € GF(2)"~* is chosen uniformly at random. The term So vig; Corresponds 
to the choice of a codeword uniformly at random in Cp; therefore, the operation (6.1) is 
equivalent to coset coding if we can prove that two different messages m and m’ generate 
sequences in different cosets of Cy. Assume that two messages m and m’ are encoded 
as sequences x and x’ in the same coset with coset leader e € GF(2)”. Then, there exist 
codewords ¢;, €2 € Co such that 


x=e+c, and x =e+o. 


Consequently, x + x’ = ¢; + €2 € Co, which is impossible unless m = m’. 

It remains to prove that there exists a parity-check matrix H such that Hx = m. 
Let H’ be an arbitrary parity-check matrix of Co. In general, if x is obtained from m 
according to (6.1), then H’x 4 m. However, the application m +> H’x is injective (and 
hence bijective); hence, there exists an invertible matrix A € GF(2)"** such that 


AH’x = m. 


Note that H = AH’ is another parity-check matrix for Co and is such that Hx = m. 


Algebraic secrecy criterion 


Since coset coding can be defined in terms of the parity-check matrix and generator 
matrix of a linear code Co, the algebraic structure of the linear code is likely to have 
a critical effect on secrecy. Hence, to clarify this connection and simplify the analysis, 
it is convenient to develop an algebraic secrecy criterion equivalent to the original 
information-theoretic secrecy criterion for coset coding. 

Consider an eavesdropper’s observation z with u unerased bits in positions 
(i1,...,4,). Ifa sequence x € GF(2)" is such that 


(Xio eeo Xi) = (Zis -s Zip), 


then the sequence is said to be consistent with z. If a coset of Co contains at least one 
sequence x that is consistent with z, then the coset itself is said to be consistent with z. 
The total number of cosets of Co consistent with z is denoted by N (Co, z). 


Lemma 6.1. Letz be an eavesdropper s observation at the output of the BEC. Then all 
cosets of Co that are consistent with z contain the same number of sequences consistent 
with z. 


Proof. Let (i4, ... , i„) denote the set of unerased positions of z. Let G € GF(2)"**" 
be a generator matrix of Co, and let g; denote the ith column of G. We define the matrix 
G,, as 


G, = (gi, go- g). 
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Let Cı be a coset of Co consistent with z. Define € (z) £ {x e Cı : x consistent with z} 
and let x; € € (z). We show that |€,(z)| = |Ker(G,,)]. 


e For any m € Ker(G,,), the vector G™m is in Co and contains zeros in positions 
(i;,...,4,). Therefore, for any m € Ker(G,,), xı + G™m € €;(z) and 
{xı + G™m: m € Ker(G,,)} S € (z). 


e Now, assume x) € € (z). Then x| +x; € Co and contains zeros in positions 
(ii, ..., iu). Hence, there exists m € Ker(G,,) such that x} + x; = Gm. Therefore, 
x, € {xı + GTm : m € Ker(G,,)} and 


&(z) Z {xı +G™m:me Ker(G,,)}. 


Hence, € (z) = {xı +G™m:me Ker(G,,)} and |6 (z)| = |Ker(G,,)}. Therefore, any 
coset of Co consistent with z contains exactly |Ker(G,,)| sequences consistent 
with z. 


Proposition 6.2. Let z be an eavesdropper s observation at the output of the BEC. Then 
the eavesdropper s uncertainty about M given his observation z is 


H(M|z) = log N(Co, Z). 


Proof. Let X” be the random variable that represents the sequence sent over the channel. 
Then, 


H(M|z) = H(MxX"|z) — H(X” |Mz) 
= H(X"|z) — H(X"|Mz). 


The term H(X” |z) is the uncertainty in the codeword that was sent given the observation z. 
By virtue of the definition of coset coding, all codewords are used with equal probability; 
therefore, H(X"|z) = log N, where N is the number of sequences that are consistent with 
z. Now, 


H(X"|Mz) = X` HX" |M = m, 2)pmiz(mlz), 
m 
and the term H(X”|M = m, z) is the uncertainty in the sequence that was sent given z 
and knowing the coset m that was used. By definition, all codewords are used with equal 
probability and, by Lemma 6.1, all cosets consistent with z contain the same number of 
sequences consistent with z; hence, H(X"|M = m, z) = log N., where N, is the number 
of sequences consistent with z in a coset consistent with z. Therefore, 


N 
H(M|z) = log N — log N, = log (x) = log N(Co, Z). 


The number of cosets is 2", therefore N(Co, z) < 2". If N(Co, Z) = 2", then all cosets 
are consistent with z and we say that z is secured by the code Co. The following theorem 
provides a necessary and sufficient condition for an observation z to be secured by Co. 


Proposition 6.3 (Ozarow and Wyner). Let Co be an (n,n — k) binary linear code with 
generator matrix G, and let g; denote the ith column of G. Let z be an observation of 
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the eavesdropper with u unerased bits in positions (i1, .. . , i„). Then zis secured by Co 
if and only if the matrix 


Gu Ê (gi Bn». B,) 
has rank n. 


Proof. Assume that G, has rank u. Then, by definition, the code Co has codewords with 
all possible sequences of GF(2)” in the jz unerased positions. Since cosets are obtained 
by translating Co, all cosets also have codewords with all possible binary sequences in 
the u unerased positions. Therefore, N(Co, z) = 2". 

Now, assume that G, has rank strictly less than u. Then, there exists at least one 
sequence of u bits c, that does not appear in any codeword of Co in the u unerased 
positions. For any sequence x’ € GF(2)", we let x, denote the bits of x’ in the u unerased 
positions. Since the cosets form a partition of GF(2)", there exists a sequence x’ in a 
coset C’ such that x/, ® c, has the same value as z in the unerased positions. Since c, 
does not appear in any codeword of Co, the coset C’ is not consistent with z; therefore 
N(Co, Z) < 24. 


As an immediate consequence of Proposition 6.3, we obtain a necessary and sufficient 
condition for communication in perfect secrecy with respect to an eavesdropper who 
observes any set of u unerased bits. 


Corollary 6.1. Let Co be an (n,n — k) binary linear code with generator matrix G. 
Coset coding with Co guarantees perfect secrecy against an eavesdropper who observes 
any set of  unerased bits if and only if all submatrices of G with u columns have 
rank u. 


Proof. If all submatrices of G with jz columns have rank jz, then any observation z with 
p unerased positions is secured by Cy and H(M|z) = k; therefore, 


I(M; Z”) = H(M) — H(M|Z”) 
=k- X pz-(2)H(M|z) 


=0. 


Conversely, if there exists a submatrix of G with u columns and rank less than u, then 
there exists an observation z’ that is not secured by Co and such that H(M|z’) < k. 
Therefore, 


I(M; Z”) = H(M) — H(M|Z”) 
=k- X pz»(2)H(M|z) 


> 0. 


Remark 6.2. A binary wiretap channel model in which the eavesdropper is known 
to access no more than u of n transmitted bits is called a wiretap channel of type II. 
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This model differs from the binary erasure wiretap channel of Figure 6.3 in that the 
eavesdropper can in principle choose which u bits are observed. This model was exten- 
sively studied by Ozarow and Wyner, and we discuss it in the context of network coding 
in Chapter 9. 


Coset coding with dual of LDPC codes 


We now go back to the binary erasure wiretap channel of Figure 6.3. In general, we 
cannot guarantee that the eavesdropper’s observation contains a fixed number E of 
unerased symbols with probability one; however, by Chebyshev’s inequality, 

VarE «(1 —€) 

n2 B? = np j 

In other words, the fraction of unerased positions is arbitrarily close to (1 — €) with 
high probability as n becomes large. Consequently, we will be able to leverage the 
results of Section 6.3.1 if we can find a suitable matrix G for coset coding such 
that the observations of the eavesdropper are secured with high probability. It turns 
out that parity-check matrices of LDPC codes satisfy the desired condition. In fact, 
the threshold property of decoding under message-passing can be interpreted as 
follows. 


VB >0 P||--a-9 
n 


JE 


Lemma 6.2. Let H be the parity-check matrix of a length-n LDPC code selected 
uniformly at random in an ensemble whose block error probability threshold under 
belief-propagation decoding for the erasure channel is a*. Form a submatrix H' of H by 
selecting each column of H with probability a < a*. Then, 


P [rk(H’) = an] = 1 — ô(n). 


Proof. By Theorem 6.3 and Theorem 6.4, with probability 1 — (n), H is such that the 
block error probability under message-passing decoding vanishes as n goes to infinity if 
a < a”. In other words, if the erased bits in a given observation z are treated as unknown, 
the equation Hz = 0 has a unique solution. Without loss of generality, we assume that 
the first bits of z are erased and rewrite the equation Hz = 0 as (H’ H”)z = 0, where H’ 
corresponds to the erased position of z. This equation has a unique solution if and only 
if H’ has full column rank an. 


Theorem 6.5 (Thangaraj et al.). Let Cy be an (n,n — k) binary LDPC code selected 
uniformly at random in an ensemble with erasure threshold a* and let C} be its dual. 
Then, as n goes to infinity, coset coding with C} and its cosets guarantees (weak) secrecy 
at rate R=k/n over any binary erasure wiretap channel with erasure probability 
e>1-a’. 


Proof. By definition, a generator matrix G of C} is a parity-check matrix H of Co. 
Therefore, by Lemma 6.2, if 1 — € < a*, any submatrix formed by selecting columns 
of G with probability 1 — e has full column rank with probability 1 — 6(m); hence, by 
Proposition 6.3, any observation Z” of the eavesdropper is secured by C} with probability 
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1 — ô(n). Equivalently, if we let Q be the random variable defined as 
Qê 1 if Z” is secured by CH, 
0 else, 


then P[Q = 1] > 1 — ô(n). Consequently, 


[=á 


1 
—I(M; Z”) < —I(M; Z" Q) 
n n 


1 
< $ EM) - HMIZ"Q)) 
1 
= $ EM) - H(MIZ"Q = DPIQ = 1] - H(MIZ"Q = OPIQ = 0) 
1 
< $ EM) — H(M) — 8(n)) 
<M 
n 
< 8(n). 


In other words, coset coding with C$ guarantees weak secrecy at rate k/n. 


Note that the analysis of coset coding with the dual of LDPC codes does not rely on any 
capacity-achieving property; our proof relies solely on the concentration and threshold 
properties established by Theorem 6.3 and Theorem 6.4. Nevertheless, as illustrated by 
the following examples, the price paid is that we cannot achieve rates arbitrarily close 
to the secrecy capacity. 


Example 6.4. A rate-5 (3, 6) regular LDPC code, with erasure threshold a* ~ 0.42, can 
be used for secure communication over any binary erasure wiretap channel with erasure 
probability € > 1 — a* = 0.58. The communication rate is 0.5, which is at most 86% 
of the secrecy capacity. 


Degrading erasure channels 


As shown in Proposition 3.3, a wiretap code designed for a specific eavesdropper’s chan- 
nel (x PAxx Z ) can be used over any other eavesdropper’s channel that is stochastically 
degraded with respect to (x ,Pzx,Z ). Codes designed for an erasure eavesdropper’s 
channel are therefore useful over a much broader class of channels. Nevertheless, rates 
are then bounded strictly below the secrecy capacity since the full characteristics of the 
eavesdropper’s channel are not necessarily exploited. The following proposition shows 
that all binary-input channels are stochastically degraded with respect to some binary 
erasure channels. 


Proposition 6.4. A memoryless binary-input channel ({0, 1}, pyx, Y) (the alphabet 
VY may be continuous or finite) is stochastically degraded with respect to an erasure 


230 


Coding for secrecy 


channel with erasure probability 


€ F. ( min prx(o)) dy. 
y uel.) 


Proof. Since Sy Py|x(v|x)dy = 1 for any x and py;x(y|x) > 0 for any x and y, it is clear 
thate € [0, 1]. Let ({0, 1}, pz)x, {0, 1, ?}) be an erasure channel with erasure probability 
€ defined as above, that is 

Vx € {0, 1} Pzx@lx) =e and pzx(x|x)=1—-e. 
Consider the channel ({0, 1, ?}, Y, py;z(v|z)) such that 


1 . 
pyiz(v|?) =- min pyxQlu), 
€ ue{0, 1} 


1 . : 
Py\z|z) = — (ro — min pxo) ifz € {0, 1}. 
l—e ue{0,1} 
One can check that these are valid transition probabilities with the value of € above. In 


addition, for any (x, y) € {0, 1} x V, 
XO pvizoldpzixlx) = pizo e+ $ pvzol A - ©) - x) 


ze{0,1,?} ze{0, 1} 


= min py|x(ylu)+ pyx(vilx) — min py|x(y|u) 
ue{0, 1} ue{0, 1} 


Py \x(y|x). 


Therefore, the channel ({0, 1}, pyx, Y) is stochastically degraded with respect to a 
binary erasure channel with probability e€. 


The following examples are direct applications of Proposition 6.4 and Proposition 3.3. 


Example 6.5. Consider a binary symmetric wiretap channel, in which the main channel is 
noiseless and the eavesdropper’s channel is binary symmetric with cross-over probability 
p< 7 A code designed for an erasure wiretap channel with erasure probability e* = 2p 
could achieve a secure rate €”. Figure 6.4 shows the secrecy capacity C; = H,(p) and the 
achievable rates as a function of the cross-over probability p. For p = 0.29, e* = 0.58, 
and we can use the irregular code of Example 6.1 and its cosets for secure communication. 
Note that the rate of secure communication is 0.5 bits per channel use, compared with 
the secrecy capacity C, œ% 0.86 bits per channel use. 


Example 6.6. Consider a Gaussian wiretap channel in which the main channel is noiseless 
and the eavesdropper’s channel is an AWGN channel with noise variance o*. Let us 
restrict our attention to binary inputs x € {—1; +1}. A code designed for an erasure 
wiretap channel with erasure probability 


1 
e* = erfc | ——— 
(za) 
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Figure 6.4 Secrecy capacity and achievable rates for a WTC with noiseless main channel and 
BSP(p) eavesdropper’s channel. 
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Figure 6.5 Secrecy capacity and achievable rates for a WTC with noiseless main channel and 
binary-input Gaussian eavesdropper’s channel. 


will achieve a secure rate «*. Figure 6.5 shows the secrecy capacity of the binary-input 
Gaussian wiretap channel and the achievable rates as a function of the eavesdropper’s 
channel variance o”. For o? = 3.28, e* ~ 0.58, and we can again use the irregular code 
of Example 6.1. The communication rate is 0.5 bits per channel use, compared with the 
secrecy capacity C, œ% 0.81 bits per channel use. 


Reconciliation of binary memoryless sources 


As seen in Section 4.3, one can generate secret keys from a DMS by performing 
information reconciliation followed by privacy amplification. Hash functions for privacy 
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Figure 6.6 Source coding with side information with syndromes of linear codes. 


amplification are already known (see for instance Example 4.5 and Example 4.6), and the 
only missing piece for implementing a complete key-distillation strategy is an efficient 
information-reconciliation protocol. For clarity, this section focuses solely on binary 
memoryless sources and the extension to arbitrary discrete sources and continuous 
sources is relegated to Section 6.5. 

Since Proposition 4.5 shows that, without loss of optimality, the reconciliation of 
discrete random variables can be treated as a problem of source coding with side infor- 
mation, we need only design an encoder for a binary memoryless source X so that a 
receiver, who has access to a correlated binary memoryless source Y, retrieves X with 
arbitrarily small probability of error. The Slepian—Wolf theorem (Theorem 2.10) guar- 
antees the existence of codes compressing at a rate arbitrarily close to H(X|Y) coded 
bits, but does not provide explicit constructions. As illustrated in Figure 6.6, one can 
build source encoders from well-chosen linear codes. Given the parity-check matrix 
H € GF(2)**” of a linear code, a vector of n observations x is encoded by computing 
the syndrome s = Hx. Upon reception of s, the decoder can minimize its probability of 
error by looking for the sequence x that maximizes the a-posteriori probability 


Ply|x, s = Hx]. 


This procedure is equivalent to maximum a-posteriori (MAP) estimation of x within the 
coset code with syndrome s. One can show that there exist good linear codes such that 
the probability of error can be made as small as desired, provided that the number of 
syndrome bits k is at least nH(X|Y). Note that the code rate of the linear code used for 
syndrome coding is | — k/n, while the compression rate is k/n; in the remainder of this 
chapter, we explicitly specify which rate is being considered in order to avoid confusion. 

In practice, the computational complexity required to perform MAP decoding is 
prohibitive, but we can use LDPC codes and adapt the message-passing algorithm 
described in Table 6.1 to obtain a suboptimal yet efficient algorithm. Essentially, given 
a syndrome s and a sequence of observations y, the key idea is to slightly modify the 
intrinsic LLRs to account for the value of the syndrome and the a-priori distribution 
of X. To describe the modified message-passing algorithm, we introduce the following 
notation. The n-bit vector observed by the encoder is denoted x = (x; ...x,)™, while 
the n-bit vector available to the decoder as side information is denoted y = (1... Yn)". 
The decoder receives the syndrome vector s = (s1 . . . 5,)', whose entries correspond to 
the values of check nodes {c), ..., cx} in the Tanner graph of the LDPC code. The set of 
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Table 6.2 Belief-propagation algorithm for source coding with side information 


Initialization. 
> For each i € [1, n] and for each j € N(i) 


i = u =0. 


> For eachi € [1,7] 


0, yi 
ADT = log (= y 2), 
pxy(l, yi) 


Iterations. For each iteration / € [1, /max]] 
> For each į € [1, n] and for each j € N(i) 


(D) _ INT (l-1) 
vj = Ar > Umj 
meN(i)\j 


> For each j € [1, A] andi € M(/) 


(1) 
v.”. 
u? = 2 tanh"! (o -2s;) [J am ( +) 
meM(j)\i 


Extrinsic information. For alli € [1,7] 


EXT __ (Imax) 
Ài = X Umi > 


meN(i) 


Hard decisions. For alli € [1, 1] 


& = 5 (1 = sign (AP? +4P"). 


check-node indices connected to a variable node x; is denoted by N (i), while the set of 
variable-node indices connected to a check node c; is denoted by M(/). The algorithm 
is described in Table 6.2. 

Compared with the algorithm in Table 6.1, note that the initialization of the LLRs is 
now based on the joint distribution pxy(x, y) and that the jth syndrome value s; affects 
the sign of the messages i 


Example 6.7. To illustrate the performance of the algorithm, we consider a uniform 
binary memoryless source X, and Y is obtained by sending X through a binary symmet- 
ric channel with cross-over probability p. In principle, one can reconstruct the source 
X”, provided that it is compressed at a rate of at least H,(p). For a code of rate 5 the 
efficiency is 8 = 1/(2(1 — H,(p))). Figure 6.7 shows the bit-error-rate versus efficiency 
performance of syndrome coding using LDPC codes of length 10* with the degree 
distributions given in Example 6.1 and Example 6.2. These non-optimized degree dis- 
tributions already provide over 80% efficiency at an error rate of 1075. Longer codes 
with optimized degree distributions can easily achieve over 90% efficiency. 
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Figure 6.7 Error rate versus efficiency for reconciliation of binary random variables based on 
LDPC codes. The regular and irregular codes are rate-+ codes with degree distributions given in 
Example 6.1 and Example 6.2 and length 10*. The optimized code is the rate-+ code of [58] with 
length 5 x 10°. 


Remark 6.3. To ensure the generation of identical keys with privacy amplification, 
note that the sequence x should be reconstructed exactly, and even a bit error rate 
as low as 1075 is not acceptable. The presence of errors is usually well detected by 
the message-passing decoder, and the key-distillation process could simply be aborted 
in such cases. However, discarding an entire sequence that contains a small fraction 
of errors incurs a significant efficiency loss. A more efficient technique consists of 
concatenating a high-rate outer code, such as a BCH code, to correct the few remaining 
errors. 


Reconciliation of general memoryless sources 


We now turn our attention to a general memoryless source (VY, pxy), which might 
be discrete or continuous. If X is a general discrete random variable, Proposition 4.5 
applies once again, and reconciliation can be treated as a Slepian—Wolf coding problem. 
If X is a continuous random variable, its lossless reconstruction would require infinitely 
many bits, and the traditional approach is to consider the approximate reconstruction 
of X under a distortion constraint. However, the objective of reconciliation is slightly 
different because we want to extract a common (binary) sequence from observations of 
the components X and Y so that privacy amplification can be used later on. Therefore, 
a pragmatic approach consists of quantizing X into a discrete random variable X’ to 
revert back to the discrete case. In principle, the source X’ can be compressed at a rate 
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R arbitrarily close to H(X’|Y) and the resulting reconciliation efficiency is 


_ W(X’) — HY) _ 15) 
b= — xY TIK a 


In general, 6 < 1, but the penalty inflicted by quantization can be made as small as 
desired by choosing a fine enough quantizer. 


Remark 6.4. A scalar quantizer is sufficient to obtain near-optimal performance as 
long as there is no rate constraint on public communication. However, for the same 
amount of information exchanged over the public channel, a vector quantizer will have 
a better performance than a scalar quantizer. 


Multilevel reconciliation 


In this section, we describe a generic protocol for the reconciliation of a memoryless 
source (VY, pxy) for which |X| < oo. As discussed above, this protocol is also useful 
for continuous random variables, and we study the case of Gaussian random variables 
in Section 6.5.2. 

In principle, we could design a Slepian—Wolf code that operates on symbols in the 
alphabet X directly, but it is more convenient to use binary symbols only. Letting 
£ = [log| X|], every symbol x € ¥ can be assigned a unique £-bit binary label denoted 
by the vector (gı (x)... g¢(x))™, where 


vi e [I, 4] gi : X > GF(2). 


The source X is then equivalent to a binary vector source BE £ (gj(X)... g¢(X))T. 
For i € [1, £], we call the component source B; £ g;(X) the ith binary level, since it 
corresponds to the ith “level” in the binary representation of X. If we were to try to 
encode and decode the £ binary levels independently with ideal Slepian—Wolf codes, we 
would use a compression rate R; arbitrarily close to H(B;|Y) for each level i € [1, £], 
and the overall compression rate would be 


£ g 
XHG: > SA (B;|B'-'Y) = H(B“|Y) = H(XIY), 
i= i=l 


with equality if and only if B; is independent of Bi~! given Y for alli € [1, £]. Therefore, 
in general, encoding and decoding the levels separately is suboptimal. 

It is possible to achieve better compression by considering the separate encoding/joint- 
decoding procedure illustrated in Figure 6.8. We start by considering the binary source 
Bı and let £; be the event representing the occurrence of a decoding error. For any 
y, € > 0, the Slepian—Wolf theorem guarantees the existence of a code compressing B, 
at rate R; = H(B,|Y) + y with probability of error P[E,] < €. Assuming that level B, is 
successfully decoded, we can now treat it as additional side information available to the 
receiver. Hence, we encode B3 using a code with compression rate Rp = H(B2|B,Y) + y 
that guarantees a probability of error of P [E2|Ef] < e. By repeating this procedure for 
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Figure 6.8 Example of a multilevel reconciliation protocol for £ = 3. Each level B; is decoded 
using Y” and previously decoded levels as side information. 


each level i € [1, £], B; is encoded independently at a rate R; = H(B,|B''Y) + y 
with a code ensuring a probability of decoding error of P [Ei | niz Ec] < e. However, 
the levels must be decoded successively, using the previously decoded levels as side 
information. The overall probability of error is 


plús =P U EUNE 


which can be made as small as desired with a large enough blocklength n since the 
number of levels £ is fixed. In addition, the overall compression rate is then 
É P 
Riot = X_ H(B;IBi'Y) + £y = H(B‘|Y) + £y = H(XIY) + £y, 
i=l 

which can be made as close as desired to the optimal compression rate H(X|Y). This 
optimal encoding/decoding scheme is called multilevel reconciliation because of its 
similarity to multilevel channel coding with multistage decoding. 


Remark 6.5. The optimality of multilevel reconciliation does not depend on the labeling 
{g;}¢ used to transform X into the vector of binary sources (B; ... Be)". However, different 
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Figure6.9 Illustration of a message-passing algorithm for multilevel reconciliation. The set of check 
nodes connected to a variable note b;; is denoted MN (ij). The set of variable nodes connected to a 
check node cj, is denoted M(ik). The set of variable nodes connected to a demapper node d; is 
denoted O(/). 


choices of labeling result in different values for the compression rates R; = H (B; |B! Y) 
fori € [1, £], which can in turn have an effect on the complexity of the code design. For 
instance, it can be quite difficult to design high-rate or low-rate codes with near-optimal 
performance. We provide heuristic guidelines for the choice of labeling in Section 6.5.2. 


As in Section 6.4, we can implement multilevel reconciliation efficiently using syn- 
drome coding with LDPC codes for each of the £ levels. Specifically, given a vector 
of realizations x = (x, ...x,) of the source X, the labeling creates £ binary vectors 
b; = gi(x) fori € [1, £], which can be represented in matrix form as 


bı bii biz Se Din level 1 
b; = bit bin kas bin level i 
by bey ber iws ben level £ 


For each level i € [1, £], the encoder computes the syndromes s; = H;b; with the 
parity-check matrix H; € GF (2)"*" ofan (n,n — ki) linear code. Note that, for a fixed 
j € [l,a], the bits {B1;, B25, . . . Bej} are correlated because they stem from the same 
symbol X;. Consequently, the intrinsic LLR of every bit b;; with i € [1, £] depends 
not only on the side information y; but also on the estimation of the bits {bm; }e \ {bi}. 
Therefore, the message-passing algorithm described in Section 6.4 must be modified to 
take into account the result of previously decoded levels. 

To describe the full message-passing algorithm, we introduce the following notation, 
which is also illustrated in Figure 6.9 for convenience. For each level i € [1, £], the 
encoder computes the vector b; = g;(x) = (bj; ...bin)™, where each b;; corresponds 
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to the ith bit in the binary description of symbol x;. The receiver, who obtains y = 
(Yı --- Yn)! as side observation, receives the syndrome vector s; = H;b; of k; bits, 
whose entries correspond to the values of check node {c;1, . . . , Cix, } in the Tanner graph 
of the LDPC code for level i. The set of check-node indices connected to a variable 
node b;; is denoted by (ij). The set of variable-node indices connected to a check 
node c; is denoted by M(ik). Finally, the update of the intrinsic LLRs is represented by 
n demapper nodes {dj}, connecting the Tanner graphs of individual levels. The set of 
variable nodes {b,;,..., bej} stemming from the same symbol x; is denoted by O(/). 
One can show that the intrinsic LLR Aj)" of bit b;,; should be calculated from symbol 
y; and the extrinsic LLRs AE*T of bits {b,,;} for m € O(/) \ i as 


mj 


Doo pxy(x.yJexp{ XO 0- ne 


ant _ log xEX:g;(x)=0 meO(j)\i (6.3) 
SS” pxy(x.yJexp{ XO 0- nea 
xEX:g;(x)=l1 mEO(J)\i 


Note that the extrinsic LLRs A}XT appear as weighting factors in front of the joint 
probability terms pxy(x, y;). As the magnitude of AȘ" increases (that is, the esti- 
mation of bm; becomes more reliable), some symbols x are given more importance 
than others in the calculation of APT. For clarity, we denote the operation defined 


by (6.3) as 


INT EXT 
1J meO(j)\i my 


The entire decoding algorithm is described in Table 6.3. 
Ifthe overall graph contains no cycles, the algorithm can be shown to compute exactly 


the a-posteriori LLRs 
P [Bij = Oly, s] 
ve (Fans . 


In practice, finite-length LDPC codes contain cycles, but the algorithm still provides 
reasonable approximations of the real a-posteriori LLRs. 


Remark 6.6. Since the update of ap is based on extrinsic LLRs, it is possible to 
modify the scheduling of the previous algorithm and to start decoding level i even if the 
previous levels |1, i — 1] have not been entirely decoded. In that case, level i cannot 
be decoded entirely, but the extrinsic LLRs might be fed back to the message-passing 
algorithm of the previous levels. Although this feedback is not necessary in principle, it 
does improve the performance of practical implementations. 


Remark 6.7. The optimality of multilevel reconciliation requires different compression 
rates, and therefore different codes, for each level. To simplify the code design, it is 
possible to use a single code across all levels, albeit with a reduced efficiency. Whereas 
multilevel reconciliation is similar to multilevel coding, this simplified approach is 
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Table 6.3 Belief-propagation algorithm for multilevel reconciliation 


Initialization. 
> For each į € [1, £], for each j € [1, n] and for each k € [1, k;] 


INT _ , EXT _ „0 u® 
hij hij Vijk = Uikj =0. 


Iterations across levels. For each level i € [1, £] 
> For each j € [1,7] 


aT EXT 


1J meQO(j)\i mj ` 


> Iteration within level. For each / € [1, Jax] 
> For each j € [1, n] and for each k € N(ij) 


vË INT X ul 1) 
Vijk = =i; + Uimj 3 


meN(ij)\k 


> For each k € [1, k;] and for each j € M(ik) 


0) 
u, = 2 tanh”! (« -2s4) [| tnh (4) ' 
meM(ik)\j 


> Extrinsic information. For each j € [1, 7] 


EXT _ (Imax) 
hij = J Uimj > 


meN(ij) 


Hard decisions. For each i € [1, £] and for each j € [1,7] 


A 1 
bj = z (1 E sign (AT + aT) 
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similar to bit-interleaved coded modulation. The decoding algorithm described in 


Table 6.3 is easily adapted to syndrome coding with a single LDPC code. 


6.5.2 Multilevel reconciliation of Gaussian sources 


In this section, we detail the construction of a multilevel reconciliation scheme for a 


memoryless Gaussian source. We assume 


X~N(0,1) and Y=X+N with N~N(0, 07), 


and we define the signal-to-noise ratio of this source as SNR = 1/o7. Notice that this 
type of source could be simulated by transmitting Gaussian noise over a Gaussian 
channel. Before we design a set of LDPC codes for multilevel reconciliation, we first 
need to construct a scalar quantizer, define a labeling, and compute the compression rate 


required at each level. 
By construction, the joint distribution of X and Y satisfies the property 


V(x,y)€R? — pyx, x) = pyx(-y, —x), 
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which it is desirable to preserve when designing the scalar quantizer. Specifically, we 
restrict ourselves to quantizers Q : R — R such that 


Pyx(y, O(x)) = pyx(-y, — Q(x), 


and we let X’ = Q(X). There is some leeway for choosing the number of quantization 
intervals, but one should choose them such that I(X’; Y) is close to I(X; Y) in order to 
minimize the efficiency penalty in (6.2). Naturally, the higher the SNR is, the more 
quantization intervals are needed. 


V(x, yr ER 


Example 6.8. For simplicity, we consider quantizers with equal-width quantization 
intervals (except for the two intervals on the boundaries). If SNR = 3, the quantizer 
with 16 intervals maximizing the mutual information I(X’; Y) is as given below. 


Quantization intervals 


(=, —2.348] 
(—2.348, —1.808] 
(—1.808 — 1.412] 
(21412, —1.081] 
(—1.081, —0.787] 
(—0.787, —0.514] 
(—0.514, —0.254] 
(—0.254 + 0.000] 


(+0.000, +0.254] 
(+0.254, +0.514] 
(+0.514, +0.787] 
(+0.787, +1.081] 
(41.081, +1.412] 
(+1.412, +1.808] 
(+1.808, +2.348] 

(+2.348, +00) 


The quantizer yields I(X’; Y) ~ 0.98 bits, compared with I(X; Y) = 1 bit. The entropy 
of the quantized source is H(X’) ~ 3.78 bits. 


We assume that the intervals can be labeled with £ bits. Once a labeling {g;}e has 
been defined, we obtain a binary vector source Bt £ (B1, ..., Be) with B; £ g;(X’) for 
i €[1, £], which is equivalent to the quantized source X’. The optimal compression 
rates that would be required for ideal codes are then R; = H(B i|YB! =!) . These rates are 
easily calculated by writing R; as 


R; = H(B,|YB‘') = H(B'|Y) — H(B‘™'|Y). 
The entropy H(B' |Y) is given explicitly by 
m. F (f rrei) f perta) a 
with 
Ai(b) © [x : (EOE), --- gE) =D). 


Note that the optimal compression rates needed at each level depend on the specific 
choice of labeling considered. 
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Example 6.9. We consider the quantizer obtained in Example 6.8. The simplest labeling 
is the natural labeling given below, in which the first level consists of the least significant 
bits. 


Natural labeling 


Level4 0 0 0 0 0O 0 O O 1 1 1 1 1 1 1 1I 
Level3 0 0 0 O 1 1 1 1 O 0 O 0 1 21 1 1I 
Level2 0 0 1 1 O O 1 1 O O 1 1 O 0 1 1 
Level! 0 1 O 1 O 1 O 1 O 1 O 1 O 1 0 1 


The optimal rates for SNR = 3 are then the following. 


Level Compression rate Code rate for syndrome coding 


4 0.079 0.921 
3 0.741 0.259 
2 0.984 0.016 
1 0.987 0.013 


Another simple labeling is the reverse natural labeling, which is similar to natural 
labeling, but for which the first level consists of the most significant bits. 


Reverse natural labeling 


Level4 0 1 O 1 O 1 O 1 O 1 O 1 O 1 0 1 
Level3 0 0 1 1 O O 1 1 O O 1 1 O 0 1 1I 
Level2 0 0 0 O 1 1 1 1 O 0 O O 1 21 1 1I 
Level! 0 0 0 0 0 0 O O 1 1 1 1 1 1 1 1I 


The optimal rates for SNR = 3 are now the following. 


Level Compression rate Code rate for syndrome coding 


4 0.936 0.064 
3 0.801 0.199 
2 0.547 0.453 
1 0.507 0.493 


If we could construct ideal codes at all rates, then no labeling would be better than any 
other. However, in practice, it is difficult to design efficient codes with compression rates 
close to unity. Rather than designing efficient codes, it is actually easier not to compress 
at all and to simply disclose the bits of the entire level. This simplification induces a small 
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penalty, but, for compression rates close to unity, this is more efficient than trying to 
design a powerful code. For instance, for the natural labeling in Example 6.9, disclosing 
levels 1 and 2 incurs a negligible efficiency loss. In addition, finite-length LDPC codes 
do not have the performance of ideal codes; therefore, the code rates at each level must 
be chosen to be below the ideal ones, which inflicts an efficiency penalty. Nevertheless, 
one can still achieve high efficiencies, as illustrated by the following example. 


Example 6.10. We consider the quantizer of Example 6.8 used in conjunction with 
natural labeling. Instead of the optimal rates computed in Example 6.9 for SNR = 3, we 
use the following: 


Level Compression rate Code rate 


4 0.14 0.86 
3 0.76 0.24 
2 1.0 0.0 
1 1.0 0.0 


Levels 1 and 2 are entirely disclosed so that there are only two codes to design, with 
rates 0.24 and 0.86, respectively. We choose the rate-0.24 LDPC codes with degree 
distributions 


A(x) = 0.249 17x + 0.163 92x? + 0.000 01x? + 0.159 92x5 + 0.023 23x° 
+ 0.080 87x'? + 0.019 58x"? + 0.036 39x? + 0.018 61x” 
+ 0.029 56x?” + 0.010 06x°° + 0.048 15x*? + 0.160 51x, 
p(x) = 0.1x4 + 0.9x°, 
and the rate-0.86 LDPC code with degree distributions 
do(x) = 0.227 34x + 0.131 33x7 + 0.641 33x4, 
p(x) = x”. 


Figure 6.10 shows the error-rate versus efficiency performance obtained with the multi- 
level reconciliation algorithm of Section 6.5.1 and various choices of blocklength. Note 
that long codes can achieve a probability of error of 1075 at an efficiency close to 90%. 


Secure communication over wiretap channels 


Although wiretap codes are not known for general channels, it is nevertheless possible 
to construct practical codes that are based on sequential key-distillation strategies with 
one-way reconciliation. In fact, fora WTC (4, pyzx, Y, Z), one can implement the 
following four-stage protocol. 
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Figure 6.10 Error rate versus efficiency of a finite-length LDPC-based multilevel reconciliation 
scheme. 


1. Randomness sharing. Alice generates n realizations of a random variable X with 
distribution px and transmits them through the WTC. Bob and Eve observe n real- 
izations of correlated random variables Y and Z, respectively. The resulting joint 
distribution factorizes as pxyz = Pyz|xPx. 

2. Information reconciliation. Alice computes syndromes using the multilevel rec- 
onciliation protocol described in Section 6.5.1, and transmits them over the 
main channel (x , PYIX, y) using a channel error-correcting code. In principle, 
the rate of the channel code can be chosen arbitrarily close to the capacity 
Cin 

3. Privacy amplification. Alice chooses a hash function at random in a universal family 
and transmits this choice over the main channel using a channel error-correcting code. 
Alice and Bob distill a secret key K. 

4. Secure communication. Alice uses the key K to encrypt a message M with a one- 
time pad and transmits the encrypted message over the main channel using a channel 
error-correcting code. 


In general, this procedure is fairly inefficient because many channel uses are wasted to 
generate a source and distill a secret key instead of for communicating secure messages. 
The secure rate R, achieved by this procedure can be estimated as follows. For a given 
input distribution px, about nH(X|Y) bits are required for information reconciliation, 
which can be transmitted with nH(X|Y)/C,, additional channel uses. The choice of a 
hash function in the family of hash functions in Example 4.6 requires about nH(X) bits 
for privacy amplification, which can be transmitted with another nH(X)/Cm channel 
uses. Finally, the secret key K distilled by Alice and Bob contains on the order of 
n(I(X; Y) — 1(X; Z)) bits, which allows the secure transmission of the same number 
of message bits and requires n(I(X; Y) — I(X; Z))/Cm channel uses. Hence, the secure 
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Figure 6.11 Opportunistic use of fading for secure communication. 


communication rate is approximately 
a Cm(I(X; Y) — 1%; Z)) 
> Ca + H(XIY) + H(X) + (X; Y) — 1X; Z) 


Example 6.11. Let us consider a binary WTC in which the main channel and the 
eavesdropper’s channel are both binary symmetric channels with cross-over probabilities 
Pm and pe, respectively (pe > pm). Assume that the procedure described above is used 
with X ~ B(). Then the secure communication rate is at most 


r 1- Ho(Pm) 
S 2+Hb(pe) — He(pm) ~ 
In the extreme case pm = 1 and pe = 5 + € for some small € > 0, note that C, is on the 


order of 1 bit per channel use but the rate achieved R, is only on the order of i bits per 
channel use. 


Example 6.12. Let us consider the quasi-static fading WTCs discussed in Section 5.2.3 
with full channel state information. As illustrated in Figure 6.11, the four-stage 
protocol can be implemented opportunistically in such a way that randomness sharing 
is performed only during secure time slots for which the legitimate receiver has a better 
instantaneous SNR than that of the eavesdropper. Reconciliation, privacy amplification, 
and secure communication are performed during the remaining time slots. This does 
not affect the secure rate because the key distilled during the protocol is secure against 
an eavesdropper who obtains the reconciliation and privacy-amplification messages 
perfectly. 

To avoid sharing randomness at a faster rate than that at which it can be processed, 
it could be necessary to use a fraction of the secure time slots for reconciliation, pri- 
vacy amplification, or secure communication; however, if the eavesdropper has a much 
higher SNR on average, all of the communication required for secret-key distillation is 
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performed during insecure time slots. In such a case, the protocol achieves secure rates 
close to the secrecy capacity of the channel. 


Remark 6.8. Although we have assumed that privacy amplification is performed using 
the hash functions of Example 4.6, there is not much to be gained by choosing another 
universal family of hash functions. It can be shown that the minimum number of hash 
functions in a universal family H = {h : GF(2)" —> GF(2)*} is 2”-* [67]; in practice, 
no families with fewer than 2” functions are known. We could in principle do better by 
performing privacy amplification with extractors. 


Bibliographical notes 


LDPC codes were invented by Gallager in 1963 [110], and experienced a renewed 
interest with the works of MacKay and others. Density evolution and the threshold prop- 
erty of LDPC codes under message-passing decoding were investigated by Richardson 
and Urbanke [111]. A Gaussian-approximation version of density evolution leading 
to a linear optimization problem was analyzed by Chung, Forney, Richardson, and 
Urbanke [112]. This enabled the design of irregular LDPC codes performing extremely 
close to the Shannon limit [113, 114, 115]. Although it is not known whether there exist 
sequences of capacity-achieving LDPC codes over arbitrary channels, it is possible to 
construct sequences of capacity-achieving LDPC codes for the erasure channel. The 
threshold for the block error probability of LDPC ensembles under belief-propagation 
decoding was analyzed by Jin and Richardson [109]. 

The first wiretap-code constructions were proposed for the wiretap channel of type II 
by Ozarow and Wyner [116]. Thangaraj, Dihidar, Calderbank, McLaughlin, and Merolla 
generalized these ideas to other wiretap channels [117] and proposed an explicit coset 
coding scheme based on LDPC codes for the erasure wiretap channel. Suresh, Sub- 
ramanian, Thangaraj, Bloch, and McLaughlin later proved that the same construction 
can be used to guarantee strong secrecy, albeit at lower rates [118, 119]. A similar 
construction based on two-edge-type LDPC codes was proposed by Rathi, Andersson, 
Thobaben, Kliewer, and Skoglund [120]. For discrete additive noise channels, Cohen 
and Zemor showed that any random linear code used in conjunction with coset cod- 
ing is likely to satisfy a strong secrecy requirement [121]. All of the aforementioned 
constructions are connected to nested codes, as highlighted by Liu, Liang, Poor, and 
Spasojević [122]. In a slightly different spirit, Verriest and Hellman analyzed the use of 
convolutional encoding over wiretap channels with noiseless main channel and binary 
symmetric eavesdropper’s channel [123]. For the wiretap channel of type II, Wei estab- 
lished a direct relation between the equivocation guaranteed by a linear code and its 
generalized Hamming weights [124]. Recently, there has also been a lot of interest in 
the design of wiretap codes based on polar codes [125, 126, 127, 128]. The polarization 
and capacity-achieving properties of polar codes over binary-input symmetric-output 
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channels allow one to perform a security analysis that closely follows the equivocation 
calculation used in Chapter 3. 

The challenges posed by the design of wiretap codes ensuring information-theoretic 
security have fostered the development of alternative secrecy metrics. For instance, 
Klinc, Ha, McLaughlin, Barros, and Kwak proposed the use of punctured LDPC codes 
to drive the eavesdropper’s bit error probability to an arbitrarily high level over the 
Gaussian wiretap channel [129]. In the same spirit, Belfiore, Oggier, and Solé proposed 
the use of lattice codes over the Gaussian wiretap channel [130, 131] and, since the error 
probability of lattices over Gaussian channels can be related to their theta series, they 
proposed a secrecy criterion based on the theta series of lattices. 

The idea of performing source coding with side information by using the syndromes 
of good channel codes was proposed by Wyner [132]. Liveris, Xiong, and Georghiades 
applied this idea to binary memoryless sources with LDPC codes and demonstrated that 
performance close to the optimal limit could be attained [133]. A threshold analysis of 
the message-passing algorithm has been proposed by Chen, He, and Jamohan [134]. 

The reconciliation of binary random variables was first considered by Brassard and 
Salvail in the context of quantum key distribution with the suboptimal yet computation- 
ally efficient algorithm CASCADE [56]. An extensive performance analysis of LDPC- 
based reconciliation for binary random variables was recently reported by Elkouss, 
Leverrier, Alléaume, and Boutros [58]. Multilevel reconciliation was first considered 
by Van Assche, Cardinal, and Cerf [59] for the reconciliation of continuous random 
variables, and implemented with turbo-codes by Nguyen, Van Assche, and Cerf [60]. 
Bloch, Thangaraj, McLaughlin, and Merolla reformulated the reconciliation of continu- 
ous random variables as a coded modulation problem and used LDPC codes to achieve 
near-optimal performance [61]. A special case of the general algorithm was developed 
independently by Ye, Reznik, and Shah [62, 78]. A multidimensional reconciliation 
scheme for continuous random variables based on the algebraic properties of octonions 
was also proposed by Leverrier, Alléaume, Boutros, Zémor, and Grangier [135], and was 
proved to be particularly efficient in the low-SNR regime. Conceptually, multilevel rec- 
onciliation is the source-coding counterpart of multilevel coding, a good survey of which 
can be found in [136] (see also [137] for bit-interleaved coded modulation). LDPC-based 
multilevel coding has been analyzed quite extensively, see for instance [138]. 


System aspects 


At the time of their initial conception, most common network protocols, such as the 
Transmission Control Protocol (TCP) and the Internet Protocol (IP), were not developed 
with security concerns in mind. When DARPA launched the first steps towards the 
packet-switched network that gave birth to the modern Internet, engineering efforts were 
targeted towards the challenges of guaranteeing reliable communication of information 
packets across multiple stations from the source to its final destination. The reasons for 
this are not difficult to identify: the deployed devices were under the control of a few 
selected institutions, networking and computing technology was not readily available to 
potential attackers, electronic commerce was a distant goal, and the existing trust among 
the few users of the primitive network was sufficient to allow all attention to be focused 
on getting a fully functional computer network up and running. 

A few decades later, with the exponential growth in number of users, devices, and 
connections, issues such as network access, authentication, integrity, and confidentiality 
became paramount for ensuring that the Internet and, more recently, broadband wireless 
networks could offer services that are secure and ultimately trusted by users of all ages 
and professions. By then, however, the layered architecture, in which the fundamental 
problems of transmission, medium access, routing, reliability, and congestion control are 
dealt with separately at different layers, was already ingrained in the available network 
devices and operating systems. To avoid redesigning the entire network architecture 
subject to the prescribed security guarantees, the solutions adopted essentially resort 
to adding authentication and encryption to the various layers of the existing protocol 
stack. 

In this chapter, we shall focus our attention on the security of wireless networks as a 
case study for the implementation of physical-layer security. This class of systems can 
be deemed an extreme case in that they combine the liabilities inherent to the broadcast 
property of the wireless medium with the many vulnerabilities of the Internet, with which 
they share the basic TCP/IP networking architecture and several essential communica- 
tion protocols. On the basis of a critical overview of existing security mechanisms at 
each layer of the network architecture, we identify how physical-layer security can be 
integrated into the system with clear benefits in terms of confidentiality and robustness 
against active attacks. 
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Basic security primitives 


Before elaborating on how each networking layer implements the standard security 
measures against known attacks, we discuss briefly the basic building blocks currently 
used in almost every security sub-system. For this purpose, we focus on the three main 
security services: integrity, confidentiality, and authenticity. Other services, such as non- 
repudiation and access control, often require solutions to be found in the realm of policy 
rather than among the engineering disciplines. 

In most applications of relevance, the ruling paradigm is that of computational secu- 
rity. In other words, the designers of the system implicitly assume that the adversary has 
limited computing power and is thus unable to break strong ciphers or overcome math- 
ematical problems deemed hard to solve with classical computers. Existing systems are 
typically secured through a mix of symmetric encryption (confidentiality), hash functions 
(integrity), and public-key cryptography (secret-key distribution and authentication). 


Symmetric encryption 


Under the assumption that the legitimate communication partners are in possession of 
a shared secret key, as illustrated in Figure 1.1, symmetric encryption algorithms (or 
ciphers) are designed to ensure that (a) the plaintext message to be sent can easily be 
converted into a cryptogram using the secret key, (b) the cryptogram can easily be re- 
converted into the plaintext with the secret key, and (c) it is very hard, if not impossible, 
to recover the plaintext from the cryptogram without the key in useful time (a property 
also referred to as a one-way trapdoor function). Useful time means here that an attacker 
must be unable to break the encryption while there is some advantage in learning the 
sent message. 

Block ciphers achieve this goal by breaking the plaintext into blocks with a fixed 
number of bits and applying an array of substitutions and permutations that depend 
on the secret key. State-of-the-art block ciphers employ the principles of diffusion and 
confusion. The former ensures that each input plaintext bit influences almost all output 
ciphertext symbols, whereas the latter implies that it is hard to obtain the key if the 
ciphertext alone is available for cryptanalysis. 

The most prominent examples are the Data Encryption Standard (DES) and the 
Advanced Encryption Standard (AES), both of which emerged from standardization 
efforts by the US National Institute of Standards and Technology (NIST). The DES uses 
a so-called Feistel architecture, which implements diffusion and confusion by repeating 
the same substitution and permutation operations in multiple identical rounds, albeit 
with different sub-keys that are derived from the shared secret key. The AES improves 
over the DES by increasing the block length, implementing substitution and permutation 
operations that are highly non-linear, and allowing very fast execution. Special modes 
of encryption are used when (a) the size of the plaintext is not a multiple of the defined 
block length or (b) the same input block appears more than once and the corresponding 
output ciphertext blocks must be different. 
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Instead of dividing the input data into blocks, stream ciphers produce one encrypted 
symbol for every input symbol in a continuous fashion. This mode of operation is similar 
to that of the one-time pad described in Chapter 1. However, rather than mixing each input 
symbol with perfectly random key symbols, typical stream ciphers use pseudo-random 
sequences that are functions of the secret key. 

Evaluating the security of symmetric encryption mechanisms is widely considered to 
be a difficult task. In contrast to the case of the one-time pad in information-theoretic 
security, there are no mathematical proofs for the assured level of secrecy. Standard 
practices include testing the randomness of the output ciphertext (the closer to perfect 
randomness the better), counting the number of operations required for a brute-force 
attack (i.e. trying out all possible keys) and cryptanalysis with known or chosen plaintext— 
ciphertext pairs. There is also growing consensus that the encryption algorithm should 
be made public, so that anyone can come up with and publish on efficient attacks against 
the most widely used ciphers. It follows that the secrecy of the encrypted information 
must depend solely on the secret key. 


Public-key cryptography 


The main drawback of symmetric encryption lies in the need for the legitimate com- 
munication partners to share a secret key. Under the computational security paradigm, 
this can be solved in an elegant way by means of public-key cryptography (also known 
as asymmetric cryptography). Instead of a secret key, each of the legitimate partners 
uses a pair of different keys, more specifically a private key, which is not shared, and 
a public key, which is available to the legitimate receiver and any potential attacker. 
The encryption algorithm is designed to ensure that the private key and the public key 
can be used interchangeably, i.e. a cryptogram generated with the public key can be 
decrypted only with the corresponding private key and vice versa. Thus, if Alice wants 
to send a confidential message to Bob, she can use his public key to encrypt the message 
knowing that only he will be able to recover the message, because no one else knows 
his private key. If the objective is to authenticate the message using a digital signature, 
Alice can encrypt the message using her private key, so that Bob can verify the sender’s 
authenticity by decrypting with her public key. 

The security of public-key cryptography relies on the computational intractability of 
certain mathematical operations, most notably the prime factorization of large integers 
or the inversion of the discrete logarithm function. The RSA scheme, named after its 
creators Rivest, Shamir, and Adleman, puts public-key cryptography into practice by 
exploiting the properties of exponentiation in a finite field over integers modulo a prime. 
To generate the public key and the private key, each user must first select two large 
primes p and q (about 100 digits each) and compute both their product n and its Euler 
totient function ¢(7) = (p — 1)(q — 1). The next step is to pick a number e uniformly 
at random among all z < ¢(n) that are prime relative to (n), and compute also the 
multiplicative inverse d < (n) of e modulus ¢(n). The public key and the private 
key are then composed by (e, n) and (d, n), respectively. To encrypt a message block 
m, satisfying 0 < m <n, Alice takes Bob’s public key Ke = (e, n) and computes the 
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cryptogram c according to 
c=m° modn. (7.1) 


Using his private key k?. = (d, n), Bob can decrypt m by calculating 
pr 


m=c! modn. (7.2) 


The correctness of the encryption and decryption mechanisms is ensured by the fact 
that e and d are multiplicative inverses modulus (n). More specifically, it follows from 
Euler’s theorem that 


ed = 1 + k(n) (7.3) 
for some k. Hence, 
cf = m” = m't m! =m. (7.4) 


The RSA scheme and similar public-key cryptography schemes typically use large 
keys (1024 to 2048 bits), which renders a brute-force attack practically unfeasible with 
classical computers. It is now widely accepted that, if stable quantum computers can 
be built with a moderate number of quantum bits, then it will be possible to factorize 
n and obtain the primes p and q in useful time. This would break hard cryptographic 
primitives such as those used in public-key cryptography. More recently, elliptic curves 
have emerged as a group upon the basis of which strong public-key encryption can be 
obtained with smaller key sizes. However, the vulnerabilities with respect to quantum 
attacks are similar. 


Hash functions 


The standard way to ensure the integrity or the message authenticity of the encrypted 
data is to generate a fixed-length message digest using a one-way hash function. As 
their names indicate, the digest is much smaller than the original data and the one-way 
hash function cannot be reversed. More specifically, it is expected that the hash function 
satisfies the following properties. 


e Pre-image-resistant. For any hash value or digest it is hard to find a pre-image that 
generates that hash value. 

e Second-pre-image-resistant or weakly collision-free. Given a hash value and the mes- 
sage that originated it, finding another message that generates the same value is hard. 

e Collision-resistant or strongly collision-free. It is hard to find two messages that 
generate the same hash value or digest. 


The digest typically results from one or more mixing and compression operations on 
message blocks, which are sometimes implemented using a block cipher and a special 
encryption mode. Flipping only one bit in the original data should already result in a 
very different digest with practically no correlation with the previous digest. Examples 
of hash functions currently in use include the SHA family of hash functions standardized 
by the NIST. Said cryptographic one-way hash functions should not be confused with the 
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Figure 7.1 Typical security sub-system (sender side). 


universal family of hash functions used in information-theoretic secret-key generation, 
as defined in Chapter 4. 


Authentication, integrity, and confidentiality 


In principle, the basic mechanisms of public-key cryptography would be sufficient to 
achieve the fundamental security goals of authentication, integrity, and confidentiality. 
However, since these mechanisms are very demanding from a computational point of 
view, most secure systems use a combination of public-key cryptography, symmetric 
encryption, and one-way hashing. 

Figure 7.1 shows a typical solution from the point of view of the sender, Alice. She 
starts by generating the message digest fa(m), which she encrypts asymmetrically (Ea) 
with her private key i The fact that Alice is the only person in possession of i assures 
the required sender authentication. Since the message digest is much smaller than the 
message, the computational overhead that comes with public-key cryptography is not a 
cause for concern. The authenticated digest, which corresponds to a digital signature, 
is then multiplexed with the original message. The output is then protected efficiently 
via symmetric encryption with the session key k,. To share the session key with Bob 
securely, Alice uses once again public-key cryptography, this time with Bob’s public 
key hee 

On the reception side, illustrated in Figure 7.2, Bob recovers ks by decrypting it with 
his private key. He then uses this key to obtain the sent message, to which he applies 
the known hash function fn. Using Alice’s public key, he can decrypt the sent message 
digest, and compare it with the result he obtained in the previous step. If the computed 
hash value is equal to the decrypted message digest, then he can rest assured that the 
integrity of the message and the authenticity of the sender are guaranteed. 


Key-reuse and authentication 


Suppose that Alice and Bob share a secret key k. Although in theory it is possible for 
Alice and Bob to refresh k by generating a new random key k for every message and 
exchanging k’ securely, the overhead incurred in real systems in terms of computation 
and number of transmissions forces practitioners to compromise and reuse the same key 
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Figure 7.2 Typical security sub-system (reception side). 


multiple times. Feasible alternatives include scheduled modifications by means of simple 
operations (e.g. a shift or an XOR with another key sequence) and re-keying by means of 
pseudo-random generators, which use the original key as the seed for subsequent keys. 

Although changing the key in these ways intuitively makes it harder for the attacker 
to break the security of the system, from an information-theoretic point of view using 
functions of the key for encryption does not change the amount of randomness or the 
uncertainty of the attacker with respect to the protected messages and the secret key. 

An active attacker, say Charles, with access to the sent cryptograms will seek not 
only to break the key but also to impersonate the legitimate sender, either by modifying 
intercepted messages or by generating new ones using the captured secret key. If we 
assume that in the worst case the attacker has unlimited computing power, the natural 
question is then whether Charles can make Bob believe that the faked messages come 
from Alice. 

The scenario described here admits an information-theoretic treatment. Alice wants 
to send n messages M; with i € [1,n]] in sequence at different points in time. To 
authenticate the messages, she uses the shared secret key K, and generates one Y; = 
F(M;, K) for every M; to be sent. At time i, Bob must use his knowledge of the key K 
and all past cryptograms Y; with j € [1, i — 1] to determine whether Y; is authentic or 
not. The decision process can be framed as a hypothesis test, in which Y; was generated 
either by Alice or by an attacker. A type-I error occurs if Bob rejects the message even 
though it was actually generated by Alice. On the other hand, Bob incurs a type-II error 
if he accepts the message when it was generated by the attacker. If we set the probability 
of type-I errors to zero, i.e. if Bob is expected to accept every authentic message sent by 
Alice, then the probability Ps, that the attack is successful is lower bounded by 


Psa > 27E n+) (7.5) 


In simple terms, the difficulty in guessing the key is reduced significantly with every 
re-utilization of the key. 

Carter-Wegman universal families of hash functions can be used to ensure uncon- 
ditional authentication. In other words, it is possible to build digital-signature schemes 
or authentication tags that cannot be forged or modified even by an attacker with infi- 
nite computing power. In precise terms, we say that the scheme is unbreakable with 
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Table 7.1 The TCP/IP architecture, features, and security mechanisms 


Level 


Layer Tasks Security mechanisms 


5 
4 


7.2 


Application Runs processes and applications End-to-end cryptography 

Transport Reliable communication and Secure Sockets Layer (SSL) and 
congestion control Transport Layer Security (TLS) 

Network Routing and forwarding Internet Protocol Security (IPSec) 

Link Medium-access control End-to-end cryptography 

Physical Transmission Spreading against narrow-band jamming 


probability p if an attacker who gains access to m and the corresponding authentication 
tag f(m) has a probability less than or equal to p of guessing the tag of another message 
m'. To transmit an authenticated message m, the legitimate partners must share a secret 
key, which consists of the message number i and a function f. The latter is chosen 
uniformly among a universal family of hash functions that map the message set M to 
the set of tags T. The fraction of functions in the same class that map m’ (different 
from m) to a particular tag t’ is 1/|T |. Thus, setting |T| > 1/p ensures that the attacker 
has a success probability of at most p. To authenticate multiple messages, the sender 
computes the authentication tag f(m) and then performs a bit-wise XOR of the tag with 
the message number, which is also known at the receiver. The message number can be 
used only once. If the authentication tag is k bits long, then the attacker cannot find a 
message whose tag can be guessed with probability higher than 1/2". 


Security schemes in the layered architecture 


The typical architecture of contemporary networks evolved from the Open System 
Interconnection (OSI) Reference Model to the prevalent five-layer TCP/IP architecture, 
which is depicted in Table 7.1. The tasks listed for each layer reflect the initial concerns 
with the reliable communication of information packets from one end of the network 
to the other. Processes and applications running at the top layer generate data streams, 
which are segmented and encapsulated into separate packets. Computers and routers then 
forward the packets across multiple links until they reach the destination. Communication 
links may exist over different physical media with multiple transmitters, thus requiring 
specific transmission schemes and channel arbitration among various devices. 

Each layer provides the adjacent upper layer with an abstraction of the network. In 
simple terms, the link layer sees a channel with or without collisions of transmitted 
packets, the networking layer sees a graph of links with certain rates, the transport layer 
sees an end-to-end connection with or without packet losses, and the application layer 
sees a bit-pipe with or without delivery guarantees. It is striking that in their original 
form none of the layers takes security aspects into consideration. 

To fill this gap, the X.800 standard defined a set of fundamental security services based 
on the original OSI networking model. As summarized in Table 7.2, these services are 
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Table 7.2 Physical-layer security vis-à-vis the X.800 standard 


Category 


Service 


Objective 


Authentication 


Access control 
Data confidentiality 


Data integrity 


Non-repudiation 


Peer-entity authentication 

Data-origin authentication 

Access control 

Connection confidentiality 
Connectionless confidentiality 
Selective-field confidentiality 
Traffic-flow confidentiality 
Connection integrity with recovery 
Connection integrity without recovery 
Selective-field connection integrity 
Connectionless integrity 
Selective-field connectionless integrity 
Origin 


Destination 


Confidence in the identity of 
communication partners 

Confidence in the source of the received 
data 

Prevent unauthorized use of a resource 

Protect a connection 

Protect a single data block 

Protect selected fields on a connection 
or in a data block 

Prevent information acquisition from 
traffic observation 

Detect and correct active attacks on a 
connection 

Detect active attacks on a connection 

Detect active attacks on specific fields 

Detect modification and some forms of 
replay of a single data block 

Detect active attacks on specific fields 
of a data block 

Prove that a data block was sent by the 
source 

Prove that a data block was received by 
the destination 


divided into five categories and fulfill specific security objectives. In the following, we 
shall demonstrate that some of these services can be enhanced using physical-layer 
techniques, which at best provide information-theoretic security and at the very least 
cause significant degradation of the signals observed by the eavesdropper. But first, we 
set the stage for the integration of physical-layer security into contemporary networks 
by elaborating on the security mechanisms that have been introduced at each layer to 
ensure integrity, authentication, and confidentiality. 


Application layer 

The security mechanisms implemented at the uppermost layer of the protocol stack 
depend heavily on the application under consideration. A security architecture similar 
to the one depicted in Figure 7.1 is often implemented to secure email or web-browsing 
services by means of end-to-end cryptography. Public-key cryptography is used exten- 
sively for the purpose of authentication and the sharing of session keys. To ensure the 
authenticity of public keys, a public-key infrastructure must include trusted third parties, 
also called certification authorities (CAs). The service they provide consists of manag- 
ing large directories of registered users and devices, issuing digitally signed certificates 
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of their public keys upon request. These certificates are then used to establish trusted 
connections among clients and servers. 

With the advent of peer-to-peer systems, in which every node is both a client and a 
server to other nodes in the network, typical network functions such as packet forwarding 
and flow control are increasingly assigned to so-called overlay networks, which consist 
of virtual logical links among processes that run in different computing nodes. This 
form of virtualization simplifies network management and information dissemination, 
while allowing the construction of topologies that increase the overall robustness of 
the networked application. The primary concern in this context relates to active attacks 
by which information flows are compromised through the injection of erroneous pack- 
ets. The solution here is to enforce integrity checks by means of cryptographic hash 
functions. 


Transport layer 

Transport protocols providing connection-oriented or connectionless services are typi- 
cally implemented in the operating system. Security extensions, most notably the Secure 
Sockets Layer (SSL) and the Transport Layer Security (TLS) standard of the Internet 
Engineering Task Force (IETF), provide message encryption and server authentication, 
with optional support for client authentication. The system stores a list of trusted CAs, 
from which certificates can be obtained in order to exchange session keys and establish 
secured connections. The schemes employed are again very similar to the one shown in 
Figure 7.1, and the transport headers of the transmitted segments are modified to reflect 
the security primitives employed. 


Network layer 

Since the main task of network routers is to store packets and forward them towards the 
right next hop until the destination is reached, their operation is limited to the network 
layer and below. Consequently, all security mechanisms are implemented at the level 
of IP datagrams. The IPSec standard allows for the establishment of a unidirectional 
network-level security association and the introduction of a so-called authentication 
header, which includes a connection identifier and the digital signature of the sending 
node. The encapsulating security payload (ESP) protocol is then responsible for ensuring 
the confidentiality of the data by means of encryption. Beyond the datagrams that carry 
application data, it is very important to protect also the control traffic, which ensures that 
the routing tables are correct and up to date. Successful attacks on link advertisements 
and other control messages can make it impossible for routers to forward the packets 
correctly, which eventually leads to a serious disruption of network services. 

To protect the network against intrusion attacks, it is common to install firewalls at 
the gateway nodes. Their goal is to filter out packets that are identified as suspicious 
on the basis of predefined rules that are under the discretion of network administrators. 
Beyond enforcing some form of access control, firewalls prevent attackers from learning 
about the available network resources and how they are mapped to different addresses 
and ports. In addition, they constitute a first line of defense against denial-of-service 
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(DoS) attacks in which attackers flood the network with fake connection requests or 
other forms of signaling. 


Link layer 

While the network layer is agnostic with respect to the underlying channel, the link layer, 
which governs the access to the channel by multiple devices, depends naturally on the 
communication medium of choice. It is thus not surprising that security mechanisms 
are more often found at the networking layer and above, where the same security sub- 
system can operate irrespective of the physical characteristics of the link over which 
communication is taking place. 

However, whereas security primitives implemented at the network layer address only 
the vulnerabilities pertaining to end-to-end connectivity, link-layer security is concerned 
with the direct connections among devices. In contrast with wired networks, where access 
to the communication link requires tapping a cable or optical link, wireless networks 
are easy targets for eavesdroppers and intruders. All that is required is a computer with 
a wireless interface, both of which have now become very affordable commodities. 
Thus, security solutions at the link layer of wireless communications systems are aimed 
at ensuring that only authorized devices can communicate over the channel. One such 
instance is the Extensible Authentication Protocol (EAP), which defines a security 
framework over which a device can prove its identity and gain access to a wireless 
network. 


Practical case studies 


The previous overview of the security mechanisms that are typically implemented at the 
various layers above the physical layer underlines the fact that the individual solutions for 
each layer are tailored to its specific tasks with little relation to the security primitives 
implemented elsewhere in the system. A device may, for example, be authenticated 
over a wireless link and operate IPSec at the network layer, while at the application 
layer a browser is accessing a web site securely using the end-to-end cryptography 
implemented by TLS. In some cases, this patchwork of security mechanisms can be 
redundant, requiring for instance parallel authentication steps at different layers or 
multiple encryptions. At the same time, some vulnerabilities are left open at other 
layers. This is well illustrated by the fact that end-to-end cryptography at higher layers 
is unable to prevent traffic-analysis attacks carried out at the lower layers, such that 
an opponent can learn about the topology of the network and the type of sessions it is 
carrying. 

Owing to the currently existing trust in the security levels assured by cryptographic 
mechanisms implemented at higher layers, it is fair to say that most available engineer- 
ing solutions do not exploit the full potential of the physical layer in increasing the 
overall security of the system. We start by providing a few motivating examples and 
then proceed with a system-oriented view of how physical-layer security inspired by 
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information-theoretic security can be incorporated into the wireless communication 
networks of today. 


Optical communication networks 

Communication networks based on optical fibers were among the first communication 
infrastructures to embrace physical-layer security. Since the information bearing signals 
are carried along the cable in the form of light with total internal reflection, typical attacks 
are physical in nature, thereby diverting the light in order to disrupt the communication 
service, degrade the quality of service, or extract information about the traffic and 
its content. Although there is very little radiation from a fiber-optic cable that would 
allow non-intrusive eavesdropping, an attacker with access to a cable can cut the fiber 
or bend it in a way that allows light to be captured from or released into the fiber. 
Sophisticated attacks are capable of intercepting selected wavelengths while keeping 
others intact. Optical jamming can then take different forms, depending on whether the 
attacker chooses to inject noise, delayed versions of the captured signal (repeat-back 
jamming), or some other disrupting signal (correlated jamming). Even if the destination 
is capable of monitoring subtle fluctuations in received power, many of these attacks 
can go undetected. Physical-layer security solutions for optical communications include 
robust signaling schemes, coding, and active limitation of the power and bandwidth of 
input signals. Since the communication rates are extremely high, coding solutions that 
require complex processing are hard to implement and therefore very expensive. 


Spread spectrum 

It is fair to say that, much like other classes of communication networks, wireless 
systems have not been designed with security requirements as their chief concern and 
main figure of merit. There is, however, one notable exception: spread-spectrum (SS) 
systems. Since the modulation schemes in this class were invented first and foremost for 
military applications, their designers aimed at counteracting the possibility of an enemy 
attacker detecting and jamming the signals sent by the legitimate transmitter. The key 
idea is to use pseudo-random sequences to spread the original narrow-band signal over 
a wide band of frequencies, thus lowering the probability of interception and reducing 
the overall vulnerability to narrow-band jamming. At the receiver the wide-band signal 
is de-spread back to its original bandwidth, whereas the jamming signal is spread over a 
wide band of frequencies. This in turn reduces the power of the interference caused by 
the jammer in the frequency bandwidth of the original signal. 

There are two main SS techniques: direct sequence spreading (DSS) and frequency 
hopping (FH). DSS modulates a signal s; onto a train of rectangular pulses p(t) of 
duration T,, also called chips, whose values are governed by a pseudo-random spreading 
sequence known both to the transmitter and to the receiver. The outcome of this process, 


s(t) = S°s;p(t — iT), 


is then transmitted to the receiver. To recover the original signal, the receiver must mul- 
tiply the acquired signal once again by the spreading sequence. The main challenge is 
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to synchronize the pseudo-random sequence used by the receiver with the one used by 
the transmitter. It follows that spreading sequences must have a strong autocorrelation. 
Furthermore, sophisticated search and tracking algorithms are needed in order to syn- 
chronize and to keep the synchronization between transmitter and receiver. This in turns 
limits the attainable bandwidth of DSS signals. 

As the name indicates, FH spreads the signal by shifting it to different frequencies 
that are dictated by a pseudo-random sequence. This operation can be expressed in the 
complex plane by 


s(t) = X expG(2xfi + ¢:)) p(t — i Th), 


where f; denotes the frequency shift, ø; is a random phase, and Th is the so-called 
hop time. Similarly to DSS, the receiver must acquire and track the pseudo-random FH 
sequence. The modulation is called fast FH if the number of hops per transmitted symbol 
is equal to or above one. Otherwise, we speak of slow FH modulation, whereby multiple 
symbols are transmitted in each hop. 

Today, spread-spectrum techniques are used extensively in wireless networks both 
for military and for civilian use. Low-probability-of-interception (LPI) systems hide 
the transmissions by exploiting the fact that signals modulated according to pseudo- 
random spreading sequences are hard to distinguish from white noise. In cellular 
mobile communication networks, spread-spectrum techniques are used primarily for 
multiple-user channel accessing, whereby all users transmit simultaneously, albeit 
using different spreading sequences. Since the spreading sequences are orthogonal 
to each other, the receiver can recover the signal transmitted by each individual 
user by multiplying the received signal by the corresponding spreading sequence. 
Privacy can be enhanced by keeping the spreading sequences secret from potential 
attackers. 


Mobile communication systems 
The Global System for Mobile Communications (GSM) standard is widely deployed 
around the world, offering mobile telephony and short-messaging services over wireless 
channels. Its security architecture is somewhat unconventional in that authentication is 
dealt with at the application level while confidentiality is implemented at the physi- 
cal layer. Data are thus encrypted after having been processed by the channel coding 
and interleaving blocks, as shown in Figure 7.3. The authentication follows a standard 
challenge—response protocol that relies on the shared secret embedded in the Subscriber 
Identity Module (SIM). The encryption stage is based on a shared session key and a 
specific stream cipher, the so-called A5 algorithm. In addition, GSM uses frequency 
hopping as a means to combat multipath fading and to randomize co-channel interfer- 
ence. Since frequency hopping is a form of spread-spectrum communication, GSM is 
inherently robust against narrow-band jamming. 

The General Packet Radio Service (GPRS) extends the functionality of GSM to enable 
data communication at rates up to several tens of kbits/s. In contrast to GSM, the system 
designers of GPRS opted to place the encryption algorithm at the Logical Link Control 
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Figure 7.3 GSM architecture. 


(LLC) layer instead of at the physical layer. Moreover, the algorithm uses a different 
stream cipher. More recent standards for mobile broadband communications such as 
the Universal Mobile Telecommunications System (UMTS) and Long Term Evolution 
(LTE) also do not employ physical-layer security, except for the aforementioned spread- 
spectrum techniques. The same is true for the Dedicated Short-Range Communications 
(DSRCs) for vehicular networks. 


RFID and near-field communications 

Radio-frequency identification (RFID) tags are emerging as a convenient means to track 
objects, such as toll cards on highways, passenger luggage in airports, consumer goods in 
retail, and parcels in shipping businesses. Active, semi-passive, and passive RFID tags all 
share the same principle of operation: (1) the antenna of the tag receives electromagnetic 
energy transmitted by the antenna of the RFID reader and (2) the tag sends the EPC 
back to the reader via the radio channel. The range of transmission can take values up 
to 100 m. Active tags use a battery to power both their circuitry and their transmissions, 
whereas passive tags must use some of the power collected by their antenna during 
reception. Semi-passive tags use battery power for the circuitry only. In a typical system, 
the tag provides a 96-bit number (also known as an electronic product code, EPC) that 
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points to an entry in a database with restricted access. An attacker might be able to read 
the EPC from the tag, but the rest of the information is stored and protected elsewhere. 

Near-field communication (NFC) builds on the idea of RFID in order to enable 
high-rate data transfers at very short ranges. The applications envisioned include mak- 
ing payments for various services or sharing multimedia files among mobile phones. 
Several prototypes exploit inductive coupling at the physical layer. The NFC forum 
(an association of more than 130 companies interested in commercializing near-field 
communication technology) makes the following claim on its web site: Because the 
transmission range is so short, NFC-enabled transactions are inherently secure. Also, 
physical proximity of the device to the reader gives users the reassurance of being in 
control of the process. 

Physical-layer security is so far absent from NFC and RFID technologies. A common 
assumption is that an eavesdropper will be farther away from the transmitting device 
than the legitimate reader. The eavesdropper is thus expected to have a worse signal-to- 
noise ratio (SNR) than that of the reader, which prevents the leakage of vital information 
through the electromagnetic waves traversing the channel. Clearly, this assumption is 
well captured by the wiretap-channel model, which leads us to believe that more secure 
NFC and RFID systems can be developed using secure channel codes and other physical- 
layer security techniques. 


Integrating physical-layer security into wireless systems 


From the previous sections it is clear that physical-layer security cannot be viewed as 
a panacea for solving all of the security concerns that exist in today’s networks. There 
is evidence, however, that careful design of the physical layer, promoting these security 
concerns as fundamental performance criteria, will lead to wireless networks that are 
arguably more secure than those whose security is rooted on cryptographic primitives 
alone. We shall now investigate how this can be achieved by applying the information- 
theoretic security principles described in detail in previous chapters. 


Impact on the security architecture 

To increase the security of a wireless network in a bottom-up fashion, i.e. from the lowest 
layer to higher layers, the design of the physical layer must go beyond its traditional 
role of ensuring virtually error-free transmission across imperfect channels by means 
of powerful error-correction coding and adequate modulation schemes. Figure 7.4 illus- 
trates two new security functions that can be introduced at the physical layer, either as 
stand-alone features or in combination with the cryptographic mechanisms at higher lay- 
ers. One function consists of reducing the number of error-free bits that an eavesdropper 
can extract from the transmitted signal. This can be achieved in several scenarios of 
practical interest by using a special class of codes, which we call secure channel codes. 
The other function is concerned with exploiting the randomness of the communications 
channel to generate secret keys at the physical layer. The keys can then be passed on to 
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Figure 7.4 Integration of physical-layer security into the network architecture. 


higher layers for use with existing cryptographic schemes. The next sections elaborate 
further on the merits and drawbacks of these two methodologies. 


Secure channel codes 

The key idea of the first approach presented in the previous section is to substitute 
typical channel codes and modulation schemes, which ensure reliability but not security, 
by the class of code constructions described in Chapter 6. These can achieve secrecy and 
reliable communication at a tolerable price in terms of achievable transmission rate and 
computational complexity. The codes degrade significantly the amount of information 
that an eavesdropper is capable of extracting from the observed signals, provided that 
his SNR is inferior to that of the legitimate receiver over the duration of a transmitted 
codeword. 

While it is fairly straightforward to discover the state of the channel between legitimate 
communication partners by means of pilot symbols, as explained below, the same is 
obviously not true for the eavesdropper, who is likely to hide his presence or at the very 
least conceal his location. We are thus left with the options of (a) transmitting only when 
the SNR is high (and therefore more likely to exceed the SNR of the eavesdropper), 
(b) ensuring by physical means that the eavesdropper cannot place an antenna within a 
certain range, or (c) degrading the eavesdropper’s reception by means of jamming the 
regions where his antenna is assumed to be located, as explained in Chapter 8. 

Secure channel codes can be used at the physical layer in a modular fashion, i.e. 
independently of the security architecture implemented at higher layers. If the sent 
datagrams are protected by cryptographic primitives, be it through symmetric encryption 
or public-key cryptography, the use of secure channel codes can strengthen the overall 
security level significantly by virtue of the fact that the eavesdropper is no longer able 
to obtain an error-free copy of the cryptogram. Instead, even less sophisticated secure 
channel codes, which increase the error probability at the eavesdropper rather than the 
total equivocation (or uncertainty) about the sent information bits, can force the attacker 
to deal with a large number of erroneous symbols that are virtually indistinguishable 
from the scrambled bits resulting from cryptographic operations. Therefore, classical 
tools, such as differential and linear cryptanalysis, no longer apply and must be modified 
to account for bit errors in the encrypted stream. In some cases, the best that can be 
achieved is to form a list of possible message sequences. It is to be expected that, in 
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Figure 7.5 Protocol for secret-key agreement. 


most cases of interest, the task of the enemy cryptanalyst will become significantly more 
complex. 

When secure channel codes are used in an opportunistic manner depending on current 
estimates of the SNR, transmissions occur only if the received power is above a certain 
threshold. Communication rules such as these will obviously affect the transmission 
schedule and the access to the channel. This suggests joint design of the physical and 
link layers, as well as an optional adaptation of routing and retransmission at the network 
and transport layers, respectively. 

Insofar as the actual implementation of secure channel codes in real systems is 
concerned, the main challenges will of course vary from platform to platform. The 
transmission schemes at the physical layer are typically implemented in a device driver, 
e.g. for the wireless interface card. With the proliferation of software-defined radios, it 
is reasonable to assume that security-oriented mechanisms at the physical layer will be 
easier to integrate into future communication systems. 


Secret-key agreement at the physical layer 

The second security function that can be implemented at the physical layer is secret- 
key agreement. As explained in Chapter 4, after sharing some common randomness in 
the form of correlated symbols, Alice and Bob can agree on a secret key by means 
of reconciliation and privacy amplification. Figure 7.5 illustrates a wireless security 
protocol that achieves this purpose. When the estimated secrecy capacity C, and main 
channel capacity Cm from Alice to Bob are found to be above the prescribed thresholds, 
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t and x, respectively, Alice transmits Gaussian-shaped symbols X” to Bob, who observes 
the channel outputs Y”. The two sequences of Gaussian symbols are correlated and form 
the core of common randomness from which Alice and Bob can generate their key. 
Using a quantizer with a bit mapping, Alice obtains a sequence of bits. To ensure that 
the two legitimate partners share exactly the same sequence from the observed symbols, 
Alice sends some side information in the form of a number of low-density parity-check 
(LDPC) bits (using, for instance, the reconciliation protocol described in Section 6.5.2), 
which are protected by a powerful channel code. Bob is thus able to receive these bits 
virtually without error, and feeds them to a belief-propagation decoder for reconciliation. 
The side information can be sent at any time, even when Eve has a better channel than 
Alice. To ensure that Eve is unable to extract any information about the key from the 
noisy symbols and the side information she has access to, Alice and Bob use privacy 
amplification, thus arriving at a smaller sequence of secret bits, which can be used to 
communicate securely. 

It has been shown that, with a fraction of time dedicated to secret-key generation 
as small as 1%, such a secret-key agreement protocol can renew a 256-bit encryption 
key every 25 kbits, i.e. with SNRs of 10 dB and 20 dB for the main and eavesdropper 
channels, respectively, and, transmitting at an average rate of 2 Mbps, a secret key could 
be replaced by a new random key every 16 milliseconds. Although these estimates may 
be optimistic, continuous generation of secret keys at the physical layer thus emerges as 
a promising solution for the key-reuse problem described. 

As in the case of secure channel codes, we are confronted with the need to modify 
the physical layer, which implies redesigning a wireless interface card and writing 
a new device driver. Again, software-defined radio may prove valuable in adapting 
the physical layer for security functionalities. Since common randomness is shared in 
an opportunistic way, the medium-access control at the link layer, which governs the 
scheduling of transmissions on the basis of channel state information, will have to 
be adapted accordingly. This calls for cross-layer design of security protocols, which 
must ensure also that higher layers are capable of acquiring and using the secret keys 
generated at the lowest layer. In current systems, this involves altering the communication 
and security components of the operating system. 


The role of channel state information 

The objective of physical-layer security is for Alice and Bob to communicate reliability 
at a certain target rate while leaking the least possible number of information bits to 
Eve. If the SNRs both of the main channel and of the eavesdropper’s channel are known 
perfectly to Alice, she can compute the secrecy capacity and adjust her transmissions 
accordingly. When the secrecy capacity turns out to be zero, Alice can decide not to 
transmit, thus avoiding the likely disclosure of information. On the other hand, when 
the secrecy capacity is strictly positive, Alice can choose a secure channel code that 
operates at that rate and achieve information-theoretic security. Alternatively, she can 
share common randomness with Bob, when the secrecy capacity is positive, and then 
generate a secret key. Either way, knowledge of the state of both channels is necessary 
for assuring information-theoretic security. 
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Estimating the channel between the legitimate users is common practice in the most 
widespread wireless communication systems. Typically, each burst contains a known 
training sequence that is transmitted together with the coded bits. In GSM, for example, 
training bits account for about 20% of the traffic. After detecting the transmitted signals, 
the receiver can use the training bits and the received signal samples to estimate the 
channel impulse response. The most common algorithms are the least-squares and linear 
minimum mean-squared error estimators. Often a feedback channel is available, over 
which the receiver can inform the transmitter about the state of the channel by sending 
either the received signal or quantized values of the main channel parameters. In some 
cases, such as time-division duplex (TDD) systems, in which the coherence time of the 
channel is sufficiently long, the channels from transmitter to receiver and vice versa 
are identical. This form of reciprocity obviously simplifies the sharing of channel state 
information. 

In some scenarios, the legitimate communication partners may have partial knowledge 
of the eavesdropper’s channel. This corresponds, for instance, to the situation in which 
Eve is another active user in the wireless network (e.g. in a TDD environment), so that 
Alice can estimate the eavesdropper’s channel during Eve’s transmissions. Naturally, an 
attacker that aims to intercept or disturb the communication between the legitimate part- 
ners cannot be expected to participate in the channel-estimation process or to follow any 
conventions set by the communication protocol. One possible attack consists of sending 
low-power signals, which lead Alice and Bob to underestimate the SNR at Eve’s receiver. 
If Eve behaves as a passive attacker, no training signals are available for estimating the 
state of the eavesdropper’s channel. In this case, the best that Alice and Bob can do is 
to use a conservative estimate based, for example, on the verifiable assumption that Eve 
is located outside a certain physical perimeter. Studies show that protocols such as the 
one illustrated in Figure 7.5 are considerably robust against channel-estimation errors. 


Authentication requirements 
We showed how secure channel codes can be used to increase the levels of confidentiality 
provided by a wireless network and how secret-key agreement at the physical layer 
helps solve the key-reuse problem by exploring the randomness of the communications 
channel. In all cases, it is assumed that the attacker is passive and does not try to 
impersonate either Alice or Bob. In most scenarios of interest, which do not grant 
this type of assurance, all communications must be authenticated, otherwise an active 
attacker is able not only to intercept the datagrams sent over the wireless channel but 
also to transmit fake signals aimed at confusing Alice and Bob. In the simplest instance, 
an attacker could disrupt the communications by jamming their transmissions, which 
can be viewed as a form of DoS. Alternatively, an impersonation attack would allow the 
attacker to play the role of the man-in-the-middle, for example generating secret keys 
with Bob while pretending to be Alice, and vice versa, and modifying the exchanged 
messages without being noticed. 

Notice that a small shared secret is sufficient to authenticate the first transmission. 
Subsequent transmissions can then be authenticated using the secret keys that can be 
generated at the physical layer through common randomness, reconciliation, and privacy 
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amplification. Despite our ability to authenticate subsequent transmissions, physical- 
layer security as it stands does not solve the key-distribution problem and we are still left 
with the question of how to share the initial secret key, which is critical for bootstrapping 
the security sub-systems of the wireless communication network. 

One way is to rely on the traditional approach using public-key cryptography. More 
specifically, nodes in the network would rely on a public-key infrastructure (PKI) to 
obtain authenticated public keys from a trusted certification authority and then use these 
public keys to share the initial secret key, which allows them to authenticate the first 
transmission and thus establish a secure link. This approach obviously inherits all the 
flaws of public-key cryptography, most notably the reliance on computational security 
and the complexity of establishing a PKI. 

If Alice (or Bob) can be trusted to be the first one to establish a connection then it is 
possible to authenticate transmissions at the physical layer using the impulse response 
of the channel. The key idea is that Alice sends a known probing signal, which Bob 
can use to estimate the channel’s impulse response and infer the channel state, more 
specifically the instantaneous fading coefficient. Since it has been shown experimentally 
that a probing signal sent by a different sender transmitting from a different position is 
going to generate a different impulse response with overwhelming probability, Bob will 
be able to detect any attempt by Eve to impersonate Alice after the first transmission. All 
he needs to do is to compare the observed impulse responses for different transmissions. 
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Part IV 


Other applications of 
information-theoretic security 


Secrecy and jamming in 
multi-user channels 


In all of the previous chapters, we discussed the possibility of secure transmissions 
at the physical layer for communication models involving only two legitimate parties 
and a single eavesdropper. These results generalize in part to situations with more com- 
plex communication schemes, additional legitimate parties, or additional eavesdroppers. 
Because of the increased complexity of these “multi-user” channel models, the results 
one can hope to obtain are, in general, not as precise as the ones obtained in earlier chap- 
ters. In particular, it becomes seldom possible to obtain a single-letter characterization 
of the secrecy capacity and one must often resort to the calculation of upper and lower 
bounds. Nevertheless, the analysis of multi-user communication channels still provides 
useful insight into the design of secure communication schemes; in particular it high- 
lights several characteristics of secure communications, most notably the importance 
of cooperation, feedback, and interference. Although these aspects have been studied 
extensively in the context of reliable communications and are now reasonably well under- 
stood, they do not necessarily affect secure communications in the same way as they 
affect reliable communications. For instance, while it is well known that cooperation 
among transmitters is beneficial and improves reliability, the fact that interference is also 
helpful for secrecy is perhaps counter-intuitive. 

There are numerous variations of multi-user channel models with secrecy constraints; 
rather than enumerating them all, we study the problem of secure communication over 
a two-way Gaussian wiretap channel. This model exemplifies the specific features of 
most multi-user secure communication systems and its analysis directly leverages the 
techniques and results presented in previous chapters. We refer the interested reader to 
the appendix at the end of this chapter and to the monograph of Liang et al. [147] for an 
extensive list of references on multi-user secure communications. 

We start this chapter by introducing the two-way Gaussian wiretap channel 
(Section 8.1). We then discuss in detail three secure communication strategies: coopera- 
tive jamming (Section 8.2), coded cooperative jamming (Section 8.3), and key exchange 
(Section 8.4). 
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Figure 8.1 Two-way Gaussian wiretap channel with interference at the eavesdropper. M; and M, 
represent the messages transmitted by Alice and Bob. R; and R2 represent local randomness used 
in the encoders. 


Two-way Gaussian wiretap channel 


As shown in Corollary 5.1, the secrecy capacity of a Gaussian wiretap channel is given 


by 
C = l lo 1 + l lo 1 + 
s 2 2 2 2 2 2 , 


and the secrecy capacity is zero if oå > oĉ, independently of the transmit power P. 
To improve secure communication rates, one should either increase the signal-to-noise 
ratio (SNR) of the legitimate receiver or decrease the SNR of the eavesdropper. A 
natural approach by which to achieve the latter is to introduce interferers into the 
system. In particular, if the eavesdropper happens to be located closer to the interferers 
than the legitimate receiver, interferences may have a more detrimental effect on her 
than on the legitimate receiver, which can result in increased secure communication 
rates. Notice that this approach implicitly requires knowledge of the locations of all of 
the transmitters and receivers or some knowledge of all of the instantaneous channel 
characteristics, so that inteferers do not harm the legitimate receiver unnecessarily. In 
practice, this knowledge would be obtained via some cooperation mechanism between 
nodes; therefore, this approach was called cooperative jamming by Tekin and Yener to 
highlight the importance of interfering intelligently. 

The concept of cooperative jamming can be applied in many different settings, and 
we refer the reader to the bibliographical notes for examples. In this chapter, we restrict 
our attention to the situation in which Alice or Bob plays the role of the interferer. 
Specifically, we consider the channel model illustrated in Figure 8.1, in which Alice 
and Bob communicate in full duplex over orthogonal Gaussian channels while their 
signals interfere at Eve’s terminal. We call this model a two-way Gaussian wiretap 
channel (TWWTC for short). At every time instant i, Alice transmits symbol X1, 
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while receiving Y;,;, Bob transmits X2,; while receiving Y2,;, and Eve observes Z;. The 
relationships among these symbols are given by 


Yai = Xii + Niz, 
Yii = X23 + Noi, (8.1) 


Zi = /hiX1,; + VhaX2,i + Neji. 


The processes {Ni ;}>1 {Na }isi1, and {Nei }i51 are iid. and distributed according 
to (0, 1); the real channel gains h, > 0 and h, > 0 account for the position of the 
eavesdropper with respect to Alice and Bob and are assumed known to all parties, 
including the eavesdropper. The inputs X{ and X% to the channel are subject to the 
average power constraints 

Ion lary 
- [Xi] < Pi and -X E[X};] < P. 


n 
i=1 i=1 


Remark 8.1. For the sake of generality, we could introduce gains gı and g on the 
forward and backward channels between Alice and Bob and introduce different noise 
variances: 


Yo; = /eiX1i + Nii, 
Vii = J 22X2,i F Noi, 
Zi = vVhiXıi + +7 h2X2i + Nei, 


where {Ni ihisi, {Naihisi, and {Ne i}i>1 are iid. zero-mean Gaussian noises with 
variances of, 03, and o2, respectively. Nevertheless, by scaling the signals as 
Yi = (1/01)Yi i Xii =EN 21/0f Xii Ý; = (1/02)Y2,;; Xj = 4 82/03X2,i, and Z; = 
(1/oe)Z;, introducing channel gains h, = hyo; /(gi02) and hy = hy03 /(g202), and 
redefining the power constraints as È, = (g, /o7)P, and P, = (g2/03)P2, one can check 
that we can always revert back to the more tractable model given in (8.1). 


The orthogonality of the channels between Alice and Bob implicitly relies on the 
assumption that any self-interference can be perfectly canceled out, and the form of 
the interference at the eavesdropper’s location is valid provided that all signals are 
synchronized. Realistically, interfering signals are unlikely to be perfectly synchronized 
when they reach the eavesdropper; nevertheless, the effect of mis-synchronization can 
be partly included in the magnitudes of the coefficients hı and ho. 

The ability to achieve secure communications over the TWWTC relies once more on 
the use of stochastic encoders. As was done in Chapter 3 and Chapter 4, it is convenient 
to explicitly introduce the randomness in the encoder and to assume that Alice has access 
to the realizations of a DMS (R1, pr,) while Bob has access to the realization of a DMS 
(R2, PR,); the DMSs are independent of each other and of the noise in the channel. For 
clarity in the definitions, we also denote the alphabets in which the symbols X1, X2, Y1, 
and Y take their values by the letters ¥|, V2, V1, and 2, respectively. A generic code 
for the two-way wiretap channel is then defined as follows. 
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Definition 8.1. A (2”®:, 2”, n) code C, for the TWWTC consists of 


two message sets, Mı = [1, 2"*'] and M2 = [1, 2"”]; 

two sources of local randomness, (R1, pr,) and (R2, pr,); 

two sequences of encoding functions, fii: Mı x Ri x yo —> X, and 
fri © Mz x Ra x VS! > X for i e [1,n], which generate symbols based on 
the message to transmit, local randomness, and previous observations; 

two decoding functions, gı : Yi x Ri x Mı > M2U {?} and g : Y3 x Ro x 
Mz > Mı U {2}. 


Note that the DMSs (R1, pr,) and (R2, pr,) can be optimized as part of the code 
design. The 2 n,) code C, is assumed known by Alice, Bob, and Eve. We 
also assume that the messages Mı € Mı and Mz € My? are independent and uniformly 
distributed in their respective sets. The reliability performance of a C, is then measured 
in terms of the probability of error 


B.C,) =P [Ms ~M>) or Mı # MCh). 
while its secrecy performance is measured in terms of the leakage 
L(Cn) = (Mi Ma; Z2"|Cnhiha). 


The conditioning on A; and A3 reflects the fact that the channel gains are known to the 
eavesdropper; however, we write I(M; M2; Z”|C,,) to simplify the notation. 


Definition 8.2. A rate pair (Ri, R2) is achievable for the TWWTC if there exists a 
sequence of a, Qn Ra n,) codes {Cy}n>1 such that 


lim P.(C,,) = 0 (reliability condition), (8.2) 
noo 
1 

lim —L(C,,) = 0 (weak secrecy condition). (8.3) 
n>o Nn 


Note that the secrecy condition requires a vanishing information rate leaked to the 
eavesdropper for messages M; and M3 jointly, which is a stronger requirement than 
a vanishing information rate for messages M, and M3 individually. In fact, the chain 
rule of mutual information and the independence of messages M, and M) guarantee 
that 


I(M1 M3; Z"|C,) = (M1; Z"|Cn) + (M1; Z"|M2Cn) 
> (Mi; Z” |Ca) + (Ma; Z” |Cn). 


Therefore, messages are protected individually if they are protected jointly, but the 
converse need not be true. 

We are interested in characterizing the entire region of achievable rate pairs (Ri, R2). 
Unfortunately, it is rather difficult to obtain an exact characterization because, in prin- 
ciple, the coding schemes in Definition 8.1 can exploit both the interference of trans- 
mitted signals at the eavesdropper’s terminal and feedback. To obtain some insight, 
we study instead several simpler strategies that partially decouple these two effects; 
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although these strategies are likely to be suboptimal, their analysis is more amenable 
and will yield achievable rate regions as a function of the channel parameters h; and A2 
and the power constraints P; and P. Specifically, we investigate the following coding 
schemes. 


Cooperative jamming: one of the legitimate parties sacrifices his entire rate to jam the 
eavesdropper; this strategy has little effect on the eavesdropper’s SNR if the channel 
gains hı and A» are small, but jamming with noise can be implemented no matter 
what the values of hı and hz are, and does not require synchronization between the 
legitimate parties. 

e Coded cooperative jamming: both Alice and Bob transmit coded information over 
the channel; if hı ~ h2, codewords interfere with roughly the same strength at 
Eve’s terminal, which allows Alice and Bob to increase their secure communica- 
tion while communicating messages; however, if hı or h2 is too large, this strategy is 
likely to be ineffective because the eavesdropper can probably decode the interfering 
signals. 

Key-exchange: one of the legitimate parties sacrifices part of its secure communica- 
tion rate to exchange a secret key, which is later used by the other party to encrypt 
messages with a one-time pad; this is perhaps the simplest strategy that exploits feed- 
back, but the key-distillation strategies described in Chapter 4 could also be adapted 
for the TWWTC. 


As a benchmark for achievable secure communication rates, we consider the region 
achieved with a coding scheme in which Alice and Bob ignore the interference created 
at the eavesdropper’s terminal and do not exploit the feedback allowed by the two-way 
nature of the channel. This is a special instance of the generic code in Definition 8.1, for 
which 


e Alice has a single encoding function fı : Mı x Ri > X? and a single decoding 
function g; : Y? > M2 U {?}; 
e Bob has a single encoding function f) : Mz x Rı > Xy and a single decoding 


function go : YJ > Mı U {?}. 


To present all subsequent results concisely, we introduce the function 
1 
C:Rt'SRtixp z LLC + x), 
such that C(x) represents the capacity of a Gaussian channel with received SNR x. 


Proposition 8.1. The rate region Ro defined by 


m f ga: OS Ri < (CP) Ta 


0 < Ro < (C(P2) — C(h2P2))" 


is achievable with independently designed wiretap codes that ignore feedback and the 
presence of interference at the eavesdropper 5 terminal. 
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Proof. Fix € > 0. We introduce the random variable Zi; = VAXi, + Nez, which 
represents the observation of an eavesdropper who cancels out Bob’s interference Xp ; 
perfectly, and we choose a rate R; that satisfies 


0 < Ri < (C(P1) — Ca PD. (8.4) 


By Corollary 5.1, there exists a (2”"', n) code C, for communication between Alice and 
Bob such that 


P [M m Mı |c] <e and “1(My3Z4IC1) <e. (8.5) 


Similarly, we introduce Z) ; = WV h2X2,i + Nei, which represents the observation of an 
eavesdropper who cancels out Alice’s interference X4, ; perfectly, and we choose a rate 
Ra that satisfies 


0 < Ro < (C(P2) = C(h2 P2))" é (8.6) 


Again, Corollary 5.1 ensures the existence of a (2”®2, n) code Cz for communication 
between Bob and Alice such that 


P [M m Ma|C2| <e and “(Mos Z416) <e. (8.7) 


The pair of codes (C4, C2) defines a special instance of a (2”*!, 2”*2, n) code C, for the 
TWWTC, which we show achieves the rate pair (R1, R2) in the sense of Definition 8.2. 
By the union bound, (8.5), and (8.7), we obtain 


P [Mu + Mı or M, # M2|C1C| < P [Mu # MC] +P [M # Ma|C2| 


< ô(€). 


In addition, the information rate leaked to the eavesdropper about messages Mı and M2 
can be bounded as 


1 1 1 
Mi Ma; Z"|C1C2) = Ms Zh C1C2) + 7 (Ma; Z"|MıC1C2) 


/N 


1 1 
-I(Mi; Z"X3|C1C2) + —I(Mz; Z"X{IM:C1C2) 


1 1 
= 7 (Mis ZiX21CiC2) + -I(Ma; Z3X1|M1C102), 


where the last equality follows because of the one-to-one mapping between (Z7, X5) 
and (Z”, X5) and the one-to-one mapping between (Z5, X7) and (Z”, XÏ). Since X5 is 
independent of M; and Z{ by construction, notice that 


*1(Mi; ZiXSIC C2) = Limi; ZiIC1). 
Similarly, since Mı and X{ are independent of A and Z5 by construction, 


“(Mas z4x; [MiCiC2) = T 1(Ma; Z416). 


8.2 
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Therefore, by (8.5) and (8.7), 
1 1 1 
Mi Mas 2°1Ci@) < -I(Mi; ZiIC:) + -I(Ma; 231C2) 
< de). 


Since e > 0 is arbitrary, all the rate pairs (R1, R2) satisfying (8.4) and (8.6) are achievable 
over the TWWTC. 


Note that Ro has a square shape, but the region collapses to a segment as soon as 
hı > 1 or hy È 1; that is, as soon as the eavesdropper obtains a better SNR than either 
that of Alice or that of Bob. 


Cooperative jamming 


As a first attempt to increase secure communication rates, we analyze a communication 
strategy in which Alice and Bob take turns jamming Eve to reduce her SNR. Formally, 
we call cooperative jamming code (cooperative jamming for short) an instance of the 
generic code in Definition 8.1 such that 


e there is only one party (say Alice) that transmits a message without relying on feed- 
back; in other words, we consider a single encoding function fj : Mı x Ri > 47 
and a single decoding function g2 : R2 x Yi > Mı U {?}; 

the other party (Bob) transmits a jamming signal, which could depend on past channel 
observations; in other words, we consider a sequence of jamming functions fz; : 
Ry x YT! > X, fori e |1, n]. 


The probability of error of such a code reduces to P [Mu Æ M,|C,|. Notice that 
restricting codes to cooperative jamming is tantamount to considering the simplified 
channel model illustrated in Figure 8.2, which we refer to as the cooperative jam- 
ming channel model. Although a cooperative code does not exploit feedback, we 
emphasize that it is not a trivial code because the jamming signals are allowed to 
depend on past observation and can be quite sophisticated. Finally, note that the roles 
of Alice and Bob can be reversed, with Bob transmitting messages while Alice is 
jamming. 


Proposition 8.2 (Tekin and Yener). The region Rej defined as 


hı Pı E 
0< Ri <a C(P}) —C I4 hP 
Rjg LU < (Ri, Ro): 


+ 
oc (<BR <(1— (ew) — (22s 


is achievable with cooperative jamming codes. 
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Proof. Assume that Bob uses his entire power to jam Eve with i.i.d. Gaussian noise 
with variance! P). Effectively, this strategy transforms the two-way Gaussian wiretap 
channel into a one-way Gaussian wiretap channel from Alice to Bob characterized by 
the input-output relationships 


Yo; =X + Nii, 
Zi = Xii + N’ 


e,i? 


where Ni; is a zero-mean Gaussian random variable with variance 1 +h 2P). By 
Corollary 5.1, the secrecy capacity of this wiretap channel is 


c=(cey—c(P_))" 
= (c-c (AA). 


Therefore, all rate pairs (R1, R2) with R2 = 0 and R; < C; are achievable. 
Similarly, if Alice uses her power to jam Eve with i.i.d. Gaussian noise with variance 
Pı, Bob effectively communicates with Alice over a Gaussian wiretap channel with 


secrecy capacity 
C = | CP) -C hP» y 
= i 1+mP,)) ` 


Therefore, all rate pairs (R1, R2) with Rı = 0 and Ry < C3 are achievable. 

The full region is obtained by time-sharing between these two modes of operation: 
during a fraction a of the time, Alice transmits with power P; while Bob jams with 
power P, and, during the remaining fraction (1 — œ) of the time, Alice jams with power 
P, while Bob communicates with power P». 


Alice’s and Bob’s maximum secure communication rates in Proposition 8.2 are always 
higher than those obtained with the benchmark strategy; however, the jamming terminal 
ignores the symbols it receives and one can wonder whether adapting the jamming to 


l Strictly speaking, transmitting iid. Gaussian noise with power P may violate the power constraint; nev- 
ertheless, if the variance is set to P) — € for some arbitrary € > 0, then the probability of violating the 
constraints can be made arbitrarily small for n large enough and the results remain virtually unchanged. 
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past observations could yield higher secure communication rates. In an idealized case, 
if Bob had access to the signal X7 sent by Alice, he could tremendously reduce the 
eavesdropper’s SNR by performing correlated jamming and partially canceling out X7. 
In our setting, Bob only has causal and imperfect knowledge of X7, but he might still 
be able to exploit the structure of the codewords. Perhaps surprisingly, it turns out that 
jamming Gaussian noise seems close to optimal. We establish this result precisely by 
deriving an upper bound on the secure communication rates achievable by Alice when 
Bob performs cooperative jamming. 


Proposition 8.3 (Bloch). The secure rates achieved by Alice with Bob performing 
cooperative jamming must satisfy 


Ry < max min(II(X;; Y2), I(X1; Y21ZX2) + 10%; ZIX1)), 


where the maximization is over random variables X1, X2, Y2, and Z with joint distribution 
PX\X2¥2z Such that 


V(x1, X2, y2, Z) € xX x X x V x Z 


PXX2¥2Z(%1, X2, V2, Z) = P¥pZ|X1Xp(V2, Z1%1, X2) Px, X2(X1, x2) 


and such that z [X$] < P, E [x3] < P. 


Proof. Let Rı be an achievable rate with cooperative jamming, and let € > 0. For 
n sufficiently large, there exists a (2”"',n) code C, such that P,(C,) < ô(€). In the 
following, we omit the condition on C, to simplify the notation. Fano’s inequality also 
ensures that (1/ n)H(M, Y3) < ô(€); therefore, 


Rı < —H(M)) 


1 
< -H(M:) — —H(Mil¥3) + 5€€) 


(Mi; Y2) + 4(€) 


Sle Sle Sle sl e 


IN 


(Xi; Y2) + d(€) 


1 n 
<- XC I(Xi j; Yaj) + 86). (8.8) 


j=1 


We now develop a second upper bound that depends on the jamming input X2 and the 
eavesdropper’s observation Z. We do so by computing an upper bound for the secret-key 
capacity of the cooperative jamming channel model in Figure 8.2. This approach is 
motivated by two observations. First, any upper bound for the secret-key capacity is also 
an upper bound for the secrecy capacity because introducing public discussion cannot 
reduce achievable secrecy rates. Second, we already know how to obtain a single-letter 
upper bound for the secret-key capacity of a channel model from Theorem 4.8. The 
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cooperative jamming channel model is slightly different from the channel model of 
Chapter 4 because not only Alice but also Bob has an input to the channel. Nevertheless, 
the proof of Theorem 4.8 can be adapted to account for this additional channel input. 

A key-distillation strategy for the cooperative jamming channel model is formally 
defined as follows. 


Definition 8.3. A (2”*, n) key-distillation strategy S, for a cooperative jamming channel 
model consists of 


a key alphabet K = [1,2"*]; 

an alphabet A used by Alice to communicate over the public channel; 

an alphabet B used by Bob to communicate over the public channel; 

a source of local randomness for Alice (Ry ; PRY) 

a source of local randomness for Bob (Ro, PR); 

an integer r € N* that represents the number of rounds of communication; 

a set of n distinct integers {i;} C [1, r] that represents the rounds in which Alice and 
Bob transmit symbols over the channel; 

r — n encoding functions f; : BI! x Ry > A fori € [1, r] \ {ij}n 

r — n encoding functions g; fori € |1, r] \ {ij}n of the form g; : Vİ x AÙ! x Ry > 
B ifi € li; + lija — 1]; 

n functions h; : B97! x Ry > Xı for j € [1, n] to generate channel inputs; 

n functions h’, : Ali“! x Ry x yr! > X for j € [1, n] to generate channel inputs; 
a key-distillation function k, : X} x B" x Ri > K; 

a key-distillation function ky : Y} x A” x Ry > K; 


and operates as follows: 


e Alice generates a realization r, of her source of local randomness while Bob generates 
ry from his; 

e inroundi € [1, i, — 1], Alice transmits message a; = f; (Bi! ; rı) and Bob transmits 
message b; = g; (a'~',r2); 

e in round i; with j € [1,n], Alice transmits symbol xı ; = h; (pi, rı) and Bob 
transmits symbol x2 j = h', (am, r2, A) over the channel; Bob and Eve observe 
the symbols yz j and z;, respectively. 

e in roundi € |i; + 1,i;41 — 1], Alice transmits message a; = fi (er, ri) and Bob 
transmits message bi = gi (vi, ai}, r). 

e after the last round, Alice computes a key K = k(x", b”, r1) and Bob computes a key 


K = k (y", a", r2). 


In addition, the vectors of channel inputs X{ and X} should satisfy the power constraints 
(1/n) rar E[X};] < Pi and (1/n) Yi, E[XZ;] < Po 


By convention, we set i,4; £ r + 1, iọ = 0, A? = 0, and B? £ 0. As in Chapter 4, 
the indices {i;}, and the sources of local randomness (Rj, pr,) and (R2, pr,) can be 
optimized as part of the strategy. A rate R is an achievable key rate for the cooperative 
channel model if the conditions in Definition 4.3 are satisfied. If R is an achievable 
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secret-key rate, we can follow the same steps as those leading to (4.20) and show that 


R < H(K) 
< —I(RıX{; R2Y3|A"B" Z”) + (€) 


I(R1; R2Y3|A"B" Z") + 8(€), 


Slr aje 


where the last equality follows because X; = A ;(BY7!, R1). As in the proof of Theo- 
rem 4.8, we introduce the random variable A; which represents the messages exchanged 
over the public channel between two successive uses of the channel: 


Ao 4 (Ai, Baas Arcis Bı, sees B;,-1); 
hee (Ai,41 sc Aipt Diet es Bit) for j € [l, 7]. 
We then expand I(R;; R2Y3|A” B” Z”) as 
I(Ri; R2Y3|A” B” Z”) 


= I(Ry; Rolo) + X` [E (Ris AJIZ YRA A!) = (Ris AjIZ/AoA?") | 
j=l 


+> [E( Ris Y2,jZ;1Z/1¥y RAA!) z (Ris Z;IZ AA ')]. (8.9) 
j=l 
As in (4.65), the first term in (8.9) satisfies I(R;; Ro|Ao) = 0 by Lemma 4.2. In addition, 
the terms in the first sum of (8.9) satisfy 
(Ri AgIZ/YgR2AQA1) = (Ri; AZAA") < 0, 


as has already been proved for Theorem 4.8. The terms in the second sum of (8.9) can 
be rewritten as 


I(Ri; Yay ZIZ YIT RAA!) = (Ri; ZIZ AA!) 
= H(Y24ZjIZ YI RAA") = H (Y2 Z;IZ YI RR AA) 
— H(Zj ZANA) + H(Z; ZIANA tR). 


We can further simplify this expression by recalling that X4, ; = A (BHL, Ri), X2, = 
h’ (AU, Ro, Y3-'), and 


RiRaVe ZINAT! > XX > Vo Z; 
forms a Markov chain. Using these properties, we obtain 
H(Y2yZ;IZI YAT RRI AA") = H(Y2Z;1X1 X23) 
and 


H(Z; ZAAR) = H (ZX ZAAR) < H(Z,IX1,;). 
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Therefore, we can bound the terms in the second sum of (8.9) by 
HH (Y2 ZIZI RAA) — L(V, jZj1%1,j%X0,j) — H(Zj1Z/“" Ag) 
+ H(Z;|X1,;) 
= H(Y2 IZI YTRA A!) +H(Z;IZ Y RAA!) 
— H (Y2 ;lZ;X1 X25) — H(Z;|X1,;X2,;) — H(Z;|Z/"'AgA/“") + H(Z;|X1,;) 
< H(Yo,j1Z;X2,j) — H(¥2,71Z;X1,jX2,7) — H(Zj1X1,jX2,7) +H(Z;1X1.;) 
All in all, we obtain our second bound: 
1 n 
R<- `. (1(X1,j5 Y2,j1ZjX2,;) + I(Z;; X2,51X1,;)) + 6(6). (8.10) 


né 
j=l 


Finally, we introduce a random variable Q that is uniformly distributed on [1, n] and 
independent of all other random variables, and we define 


Xı = Xia: X2 = X20, Y> = Vg, and L= Zo: 


The transition probabilities from X;X»2 to Y2Z are the original transition probabilities 
of the channel py,z)x,x, and, in addition, X; and X2 should satisfy the power con- 
straints E [XÎ] < Pı and E[X3] < Py. By substituting these random variables into (8.8) 
and (8.10), and using the fact that Q —> XıXı — ZY» forms a Markov chain, we obtain 


R < min(I(X1; Y2), I(X1; Y21ZX2) + I(X2; Z|X1)) + dE). 


Since € can be chosen arbitrarily small and since we can optimize the distribution of 
inputs X Xz, we obtain the desired result. 


The optimization of the upper bound has to be performed over the random variables 
(X1, X2) jointly; therefore the terms I(X1; Y2|ZX2) and I(X2; Z|X;) are not indepen- 
dent. Nevertheless, the result can still be understood intuitively as follows. The term 
I(X1; Y2|ZX2) represents the secrecy rate achieved in the presence of an eavesdropper 
who would be able to cancel out the jamming signal X2 perfectly, whereas the second 
term, I(X2; Z|X1), represents the information that the eavesdropper has to obtain in 
order to “identify” the jamming signal and cancel it out. By specializing Proposition 8.3 
further, we obtain the following result. 


Proposition 8.4 (He and Yener). The region of rates achievable with cooperative 
jamming is included in the region RG defined as 
Pi 
1 + hP 
P, 
1 + hP 


0< Ry) <min (c ( ) + ChP), cen) 
RE = I (Ri, Ro): 


0 < Rə < min (e( ) + Ch Pi), CCP») 
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Proof. We can further bound the result of Proposition 8.3 as 


Ry < min(max 1(X;; Y2), max(I(X1; Y2]ZX2) + 1X2; ZIX1))), 


where the maximum is taken over all joint distributions px,x, such that z [Xi] < Pi 
and E [X2] < P». The first term, I(X;; Y2), cannot exceed the capacity of the channel 
from Alice to Bob; therefore, 


max I(X1; Y2) < C(P1). 
To bound the term max(I(X1; Y2|ZX2) + I(X2; Z|X1)), we introduce the random variables 
Zi = VhXi + Ne and Zo = VhX2 + Ne, 


which represent the observations of an eavesdropper who would be able to cancel out 
either one of the signals X; and X2. Then, 


I(X1; Y2|ZX2) = th(¥2|ZX2) — h(Y2|ZX1X2) 
= th(Y2|Z1X2) — h(N1) 
< h(¥2|Z1) — H(Ny) 
= h(¥2Z;) — h(Z;) — h(N}). (8.11) 


Similarly, 


I(X2; Z|X1) = h(Z|X1) — h(Z[X1X2) 
= h(Z)|X1) — h(Ne) 
< h(Z2) — h(Ne). (8.12) 


We now show that the upper bounds (8.11) and (8.12) are maximized by choosing 
independent Gaussian random variables for X; and X2. Since (8.11) depends only on 
Xı and (8.12) depends only on X2, the bounds are maximized with independent random 
variables. In addition, the term h (Z2) is maximized if Z2 is Gaussian, which is achieved 
if X2 is Gaussian as well. To show that h(Y2|Z;) is also maximized with X; Gaussian, let 
LLSE(Z;) denote the linear least-square estimate of Y2 based on Z; and let A,,5; denote 
the corresponding estimation error. Then, 


1 
NO221) = h2 — LLSE(Z1)|Z1) < h2 — LLSE(Z1)) < 5 log(2meAuse). 


The inequalities are equalities if Yọ and Z, are Gaussian, which is achieved if X; 
is Gaussian; hence, we can evaluate (8.11) and (8.12) with two independent random 
variables X; ~ M (0, P,) and X2 ~ N(0, P2). 

On substituting X2 ~ N (0, P2) into (8.12), we obtain directly 


I(X2; Z|X1) < C(h2 Py). (8.13) 
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A little more work is needed to bound (8.11). IfX; ~ (0, P1), then the vector (Y2, Z,)T 


defined as 
Y2 A 1 1 0 
(2) e (vm) + (a) + (1) ~ 


is also Gaussian with zero mean. Since X;, N1, and Ne are independent, its covariance 
matrix is 


K = 1+ P vhi Pi : 
NAT (VEP, 1+hP,)’ 


therefore, 
(X1; Y2|ZX2) < h(¥2Z1) — h(Z:ı)— h(N;) 
1 1 
= log(27e|Ky,z,|) — 5 log(27e(1 + hı P1)) — log(27e) 


= C(P, + Ay Pi) — Cy Pi) 


P 
S| jy (8.14) 
1+h,P 


On combining (8.13) and (8.14), we obtain the second part of the upper bound 
for Rı: 


Pi 


1(X1; Y2|ZX I(X2; Z|X1)) < — 
max(I(X1; Y2|ZX2) + I(X2; Z|X1)) c(t. 


) + C(h2 P2). 


The bound for R is obtained with identical steps by swapping the roles of Alice and 
Bob. 


The regions a and Rej cannot coincide because RJ has a square shape whereas 
Rej has a triangle shape (obtained with time-sharing). Nevertheless, Proposition 8.4 still 
provides a reasonably tight bound for the maximum secure rate achieved by either Alice 
or Bob with cooperative jamming over a wide range of channel parameters. Figure 8.3 

out 


illustrates typical regions RG , Rej, and Ro, for which the extremal points of Rg are 


within a few tenths of a bit of the outer bound RE". 


Remark 8.2. It might be somewhat surprising that an outer bound obtained by ana- 
lyzing the secret-key capacity of a channel model turns out to be useful. Nevertheless, 
the reasonable tightness of the bound can be understood intuitively by remarking that 
the addition of a public channel of unlimited capacity is not a trivial enhancement. In 
fact, the public channel does not modify the randomness of the channel, which is the 
fundamental source of secrecy. As a matter of fact, we already know from Corollary 3.1 
and Theorem 4.8 that the secrecy capacity and secret-key capacity of a degraded wire- 
tap channel are equal. In the case of cooperative jamming, the channel also exhibits 
some degradedness, which partly explains why the approach of Proposition 8.3 is 
useful. 
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Figure 8.3 Regions Ro, Roj, and Ree for hı = 0.2, hy = 0.3, P; = 20, and P, = 10. 


Coded cooperative jamming 


Since cooperative jamming always improves the maximum secure communication rate 
of either Alice or Bob, cooperative jamming achieves rate pairs that are not achievable 
with the benchmark strategy and Rej É Ro. Unfortunately, cooperative jamming forces 
either Alice or Bob to stop transmitting information. As a result, if the magnitude of 
the channel gains h; and h2 is small (A, « 1 and A2 < 1), the eavesdropper does not 
suffer from much interference and Alice and Bob might as well treat their channels 
as orthogonal wiretap channels; therefore, in general, Ro g Rg either, and one cannot 
conclude that cooperative jamming performs strictly better than the benchmark strategy. 
The numerical example in Figure 8.3 is a specific case of such a situation. 

To overcome this limitation, one can wonder whether Alice and Bob could achieve the 
effect of cooperative jamming while still communicating codewords. The interference 
of codewords might still have a detrimental effect on the eavesdropper, but without 
sacrificing the entire rate of either Alice or Bob. To emphasize that the underlying 
principle is similar to cooperative jamming but that the eavesdropper observes codeword 
interference, we call such schemes coded cooperative jamming codes (coded cooperative 
jamming for short). Formally, coded cooperative jamming is a specific instance of a code 
for the two-way wiretap channel such that 


e there are two independent encoding functions, fı : Mı x Rı > Xf and fo : M2 x 
Ro = XY; 
e there are two decoding functions, gı : Y? > M2 U {?} and go : Vy > Mı U {?}. 


Note that coded cooperative jamming does not exploit feedback, but the codebooks 
used by Alice and Bob can be optimized jointly prior to transmission to maximize the 
effectiveness of codeword interference. 
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Figure 8.4 Regions Ro, Roj, Rec, and Rt for hı = 0.2, hy = 0.3, P, = 20, and P, = 10. 


Proposition 8.5 (Tekin and Yener). The rate region Recj defined as 


O<R C(P\) — C ai J 
<Rı < =e a 
l i 1 + hP 


Rej = 4 (Ri, Ro): hP. é 
gr Yi (ce-ce (22, )) 
Ai 


0 < Ri + Ry < (C(P1) + C(P2) — CA, P, + hP} 


is achievable with coded cooperative jamming. 


Remark 8.3. The region R" in Proposition 8.3 is also an outer bound for Rc. In 
fact, Re has been derived for the case of arbitrary jamming signals, which includes 
ont 
the jamming signals to be decoded by the non-jamming terminal. Removing this con- 


codewords as a special case, and, in addition, R°&™ has been obtained without requiring 
straint cannot reduce the secure rates of the non-jamming terminal; therefore, Rej" is 
an outer bound for Roj. 


Before we prove Proposition 8.5, it is useful to analyze its implications. As illustrated 
in Figure 8.4, the shape of Recj is reminiscent of the capacity region of a multiple- 
access channel. This similarity is not fortuitous because the channel linking Alice and 
Bob to Eve is indeed a multiple-access channel, and we explicitly use it in the proof. By 
comparing Proposition 8.1 and Proposition 8.5, we see that the individual rate constraints 
on R; and R are more stringent in Ro than they are in Recj. In fact, if (R1, R2) € Ro 
then the sum-rate satisfies 


Ri + Ry < C(P}) — C(hy Pi) + C(P2) — C(h2 P2), 
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Figure 8.5 Enhanced channel for coded cooperative jamming. 


which is more stringent than the sum-rate constraint in Rocj because 


C(P1) + C(P2) — CPi + hp Po) = C(Pi) + C(P2) — CU Pi) — C (es) 
+h P, 


> C(P1) — C(A, Pi) + C(P2) — C(A2 P2). 


Hence, in contrast with Rej, we can conclude that Ro C Recj for all channel parameters. 

We now prove Proposition 8.5. The proof is based on a random-coding argument that 
combines the wiretap coding technique introduced in Section 3.4.1 with the multiple- 
access coding technique described in Section 2.3.2. As in Chapter 3, we start by con- 
structing codes for an enhanced channel, which is illustrated in Figure 8.5. This channel 
enhances the original channel by 


e introducing a virtual receiver, hereafter named Charlie, who observes the same output 
Z” as Eve and has also access to the messages M; and M, through an error-free side 
channel; 

e using a message Mg with uniform distribution over l, 2nRa | in place of the source 
of local randomness (R1, pr,) and another message Ma,2 with uniform distribution 
over |1, 2”%#2] in place of (R2, pr,). 


Formally, a code for the enhanced channel is defined as follows. 


Definition 8.4. A (27*:, 2" Ra, 2R, 2”Ra2 | n) code C, for the enhanced channel consists 


of 


e four message sets, Mi = [1,2""], Mai = [1,22], Mo = [1,2"%], and 
Ma2 = Į, anka2] ; 

e two encoding functions, fi : Mi x Mai > Xf and fo: Mı x Ma2 > #3; 

e a decoding function gı : Yi —> (M2 x Mo2,4) U {2}, which maps each channel obser- 
vation y to a message pair (M2, ma) € Mz x Ma, or an error message ?; 
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e adecoding function go : Y} —> (Mı x Mai) U {?}, which maps each channel obser- 
vation y} to a message pair (m1, ma) E€ Mi x Ma, or an error message ?; 

e adecoding function h : Z" x Mı x Mz > (Mai x Maz) U {2}, which maps each 
channel observation z" and the corresponding messages mı and mz to a message pair 
(m,,™2) € May X Maz or an error message ?. 


We assume that all messages are uniformly distributed in their respective sets. The 
reliability performance of a (2”"', 27¥a1, 9” R2, 2"Ra2, n) code C, is measured in terms of 
the average probability of error 


PCa) © P [Mu Ma) # Mi, Maa) or (Ma, Maa) # (Ma, Maz) 
or (Mai, Ma.2) # (Mas Ma.2)ICn] - 


Since Mg and Mg» are dummy messages that correspond to a specific choice for 
sources of local randomness (R1, Pr,) and (R2, pr,), a (27%, 27Rar, 2782 QnRaz n) 
code C, for the enhanced channel is also a (2”*', 2”*2, n) coded cooperative jamming 
code C,, for the original channel; the probability of error over the original channel does 
not exceed the probability of error over the enhanced channel since 


P |M, Ma) # (Mi, M2)ICn] < PCr). 
In addition, by virtue of Fano’s inequality, 
1 
z HMa, Ma21Z"MiM2Cn) < 5(Pe(Cn)). (8.15) 


The above inequality is useful to evaluate the leakage to the eavesdropper guaranteed by 
Cn later on. 

We begin by choosing two independent probability distributions px, on X; and px, 
on X2. Let 0 < € < uxx yz, Where 


UxixYz = Px 1) Px, X2) PYZ|xX:xX.0, z|x1, x2), 


min 
(x1,X2,y,Z)EX XXXV XZ 


and letn € N*. Let R; > 0, Ra,ı > 0, Ro > 0, and Ra,2 > 0 be rates to be specified later. 
We construct a (2”21 , 2”Ra1, 27R2, 2nRa2. n) code for the enhanced channel as follows. 


Codebook construction. Construct a codebook Cı with [2”*'][2”%«1] codewords 
labeled xf (m1, ma.) for mı € l, Pee and ma € 1, grkat] by generating the 
symbols xı ;(m1, ma,ı) for i € [1, n], mı € [1,2""'], and mg, € [1,2"%"] inde- 
pendently according to px,; similarly, construct a codebook C2 with [2”*?] [2”2a2] 
codewords labeled x3 (m2, ma,2) for mz € |2, 2” ] and maz € [2, 2”*#2], by gener- 
ating the symbols x2 ;(m2, ma,2) fori € [2, n], mz € |2, 2"®], and ma, € [2, 2"*] 
independently according to px,. 

Alice’s encoder fı. Given (m1, mq,1), transmit x} (m1, mq,1). 


Bob’: encoder fy. Given (m2, ma,2), transmit x3 (m2, m4,2). 
Alice’s decoder fı. Given yf, output (m2, Ma,2) if it is the unique message pair such 
that (x302, ma), yt) € T” (X2Y1); otherwise, output an error ?. 
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e Bobs decoder f). Given y}, output (7, m4,1) if it is the unique message pair such 
that (xP (1, Ma), y3) € TË (X1Y2); otherwise, output an error ?. 

e Charlies decoder fz. Given z”, mı, and m2, output (774.1, Ma,2) if it is the unique 
message pair such that (x7(mj, ña,1), x3 (m2, Ma2),2") € T."(X1X2Z); otherwise, 
output an error ?. 


The random variable that represents the randomly generated code C, = (C1, C2) is 
denoted by C,,. By combining the arguments used in Section 2.3.2 for the MAC and in 
Section 3.4.1 for the WTC, we can show that, if 


Ri + Rai < 1(X1; Y2) — ô(€), 

Ro + Raz < I(X2; Y1) — ô(€), 
Rai < I(X1; Z|X2) — ô(€), 
Ra,2 < I(X2; Z|X1) — ô(€), 


Rai + Raz < I(X1X2; Z) — ô(€), 


then E[P.(C,)] < 6(€) for n large enough. In particular, for the choice? X4 ~ M(0, P1) 
and X2 ~ N (0, P2), the constraints become 


Ry + Rai < C(P1) — (€), 
Ry + Raz < C(P2) — ô(€), 
Rai < C(hı Pi) — ô(€), (8.16) 
Ra,2 < C(h2 Po) — ô(€), 
Rai + Raz < CAP + h2 Po) — ô(€). 


We now compute an upper bound for E[(1/n)L(C,)] by following steps similar to 
those in Section 3.4.1: 


1 1 
| Uc.) = —I(MiM2;Z"|C,,) 
n n 


1 1 
„M1 M2Ma, 1 Ma,2; Z”|C,)— IMa, Ma2; Z"IM:M2C,) 


1 1 

-I(XiX3; 2"1Cn) — z HMa, Ma, 21M1 M2C,) 
1 

+ z HMa, 1Ma,2|1Z"M1M2C,) 
1 n n n 1 

= -I(X1X2; Z IC,) — z Hl(Ma.1Ma,21Cn) 


1 
F 7 Ma Ma 2|Z"M:M2C,). (8.17) 


2 See the proof of Theorem 5.1 for a discussion about the transition to continuous channels. 
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By construction, 


1 
z Ma. Ma,21Cn) > Rai + Rap. (8.18) 


Next, using (8.15) and E[P.(C,,)] < 6(€), we obtain 


1 1 
z HMa,Ma,2|Z"M:M2Cx) => po, (C) z HMa,Ma,21Z"MıM2Cn) 
C, 


< ô(n) + E[R(C,)]Ra,1 + Ra,2 + ê(n)) 
< ô(€). (8.19) 
Finally, note that C, —> XX —> Z” forms a Markov chain; therefore, 
TIORI) < “1(X1X$5Z") = 1(X1X2; Z). (8.20) 


On substituting (8.18), (8.19), and (8.20) into (8.17), we obtain 


1 
j [ren] < I(X1X2; Z) — Ra — Raz + ô(€) 


= C(hı Pi + ha Po) — Raa — Rao + ô(€). 


Note that, for any rate pair (R1, R2) such that 


Ri <| C(P)-C -Aan \\" 
< = ’ 
i : 1 + hP 


R ce) -c| 2 t 8.21 
e< (0-0 (TERT) = 


Ri + Rz < (C(P1) + C(P2) — C(hı Pi + hPa), 
we can choose a rate pair (Rq1, Ra,2) that satisfies 
Ri + Rai < C(P1) — ô(€), 
Ry + Raz < C(P2) — (€), 
Rai + Raz = CPi + ho Po) — 4(€), 


so that R1, Rai, Ro, and Ra 2 satisfy the constraints in (8.16). This choice then guarantees 
that 


:| uc.) < 4(€). 
n 


By applying the selection lemma to the random variable C,, and the functions P, and 
L, we conclude that there exists a specific code C, = (C1, C2) such that P.(C,,) < d(€) 
and L(C,,) < ô(€); hence, the rate pairs (R1, R2) satisfying (8.21) are achievable. 


Remark 8.4. Although the codes Cı and C, are generated according to independent 
distributions, note that the selection lemma selects the two codes jointly; therefore, 
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the codes are used independently by Alice and Bob but optimized jointly to guarantee 
secrecy. 


Key-exchange 


The last scheme we analyze for the TWWTC combines some of the ideas of cooperative 
jamming with a simple feedback mechanism that allows one user to transfer part of its 
secret rate to the other user. The motivation for this scheme is the situation in which 
one of the channel gains is high, say hı >> 1. On the one hand, Eve observes Alice’s 
signals with a high SNR, which limits Alice’s secure rates even if Bob jams. On the 
other hand, Eve greatly suffers from Alice’s jamming, which increases Bob’s secure 
rates. The increase in Bob’s secure communication rates can be so great that it becomes 
advantageous for Alice to jam and help Bob send her a secret key, which she later uses 
to encrypt her messages with a one-time pad. The strategy we just described leads to the 
region of achievable rates specified in the following proposition. 


Proposition 8.6 (He and Yener). The rate region Rp defined by 


A R; < aR} 
Re U {Ris Ro) e a 


we[0,1] 
with 
R* C(P O e ` 
1 = max min | BC(P)), a( (Pi) — (z) 
' miet hyP, j 
+( -ø ( (a= (Ta) 
and 


h2 Pz t 
R} = max min (rev, B (cr =i (rx) 


1 cep- c (2 y 
+ ( -ø ( (Pi)— (S) 


is achievable with cooperative jamming and key exchange. 


Proof. We formalize the idea sketched earlier and we analyze a scheme that operates in 
two phases of (1 — 8)n and £n channel uses, respectively, for some 6 € [0, 1]. During 
the first (1 — 8)n channel uses, Alice jams with Gaussian noise while Bob uses a wiretap 
code to transmit a secret key. Proposition 8.2 guarantees that Bob’s key-transmission 
rate per n channel uses can be arbitrarily close to 


aa cee- c (2 t 8.22 
r=( -a ( (P2) — E) : (8.22) 
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Figure 8.6 Regions R¢cj, Rae and Rp for hy = 0.2, h) = 1.3, P, = 20, and P, = 1. 


During the remaining n channel uses, Bob jams with Gaussian noise while Alice 
transmits a secret message by combining a wiretap code and a one-time pad with the 
secret key sent by Alice as in Section 3.6.2. By Proposition 8.2 and Proposition 3.9, the 
secure transmission rate can be arbitrarily close to 


in | BC(P cpy—c{ A i R 8.23 
min ( BC(). 8 (CC) - (SE) +R) (8.23) 


On substituting (8.22) into (8.23) and optimizing over p we obtain the desired rate RY. 
The rate RŽ is obtained by swapping the roles of Alice and Bob, and the full region is 
obtained by time-sharing between these two modes of operation. 


The choice £ = 1 in Proposition 8.6 eliminates the key-exchange phase of the scheme, 
which then reduces to the cooperative jamming in Section 8.2. Hence, we can conclude 
that Roj © Rip for all channel parameters. The feedback scheme suffers from the same 
drawbacks as cooperative jamming, in that it forces either Alice or Bob to stop transmit- 
ting. Nevertheless, as illustrated in Figure 8.6, it is possible to find channel parameters 


for which the feedback scheme achieves rate pairs outside RO; this result confirms that 


j 
feedback and interference are fundamentally different mechanisms, thereby calling for 


coding schemes that combine the two techniques. 


Remark 8.5. The scheme described above is by no means the only possible scheme com- 
bining cooperative jamming and feedback. In fact, one can combine the key-exchange 
mechanism with coded cooperative jamming or adapt some of the key-distillation 
schemes described in Chapter 4. Unfortunately, it becomes rather difficult to obtain 
closed-form expressions for achievable rates. 
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Bibliographical notes 


Multi-user information-theoretic models with secrecy constraints have been the sub- 
ject of intensive research. We provide a non-exhaustive but, we hope, representative 
list of references. Additional information can be found in the monograph of Liang 
et al. [147] and the book edited by Liu and Trappe [148]. 

The concept of coded cooperative jamming and cooperative jamming was introduced 
by Tekin and Yener for the two-way Gaussian wiretap channel and the K-user Gaus- 
sian multiple-access channel with multiple eavesdroppers [149]. The near-optimality of 
Gaussian noise for cooperative jamming over the two-way Gaussian wiretap channel was 
established by He and Yener [150] and by Bloch [151] using two different techniques. 
The proof developed in this chapter follows the approach of Bloch, and the property 
that h(Y|Z) is maximized for X Gaussian is due to Médard [152], an extension of which 
to MIMO channels was provided by Khisti and Wornell [83]. For the multiple-access 
channel, cooperative jamming with Gaussian noise and coded cooperative jamming 
with Gaussian codebooks is suboptimal, as was observed by Tang, Liu, Spasojevic, 
and Poor [153, 154]. This suboptimality was confirmed by He and Yener, who recently 
showed that structured codebooks based on lattices outperform Gaussian codebooks, 
in general [155, 156]. The idea of cooperative jamming can be applied in many other 
settings, such as broadcast channels with cooperative receivers, as studied by Ekrem and 
Ulukus [157] and Bloch and Thangaraj [158], or from a secrecy-outage perspective in a 
wireless fading environment, as done by Vilela, Bloch, Barros, and McLaughlin [159]. 
Comprehensive discussions of cooperative jamming by He and Yener and by Ekrem and 
Ulukus can be found in [148, Chapter 4 and Chapter 7]. 

The role of interference in multi-user systems has been analyzed in various settings. 
For instance, Simeone and Yener [160] and Liang, Somekh-Baruch, Poor, Shamai, 
and Verdu [161] investigated cognitive channels with secrecy constraints, in which 
non-causal knowledge of other users’ messages allows the legitimate user to interfere 
intelligently and to gain an advantage over the eavesdropper. Ina slightly different setting, 
Mitrpant, Vinck, and Luo [162] investigated the combination of dirty-paper coding and 
wiretap coding and showed how knowledge of a non-causal interfering signal can be 
exploited for secrecy. 

The importance of feedback for secure communications was originally highlighted 
in the context of secret-key agreement, which we discussed extensively in Chapter 4. 
Nevertheless, the study of secret-key capacity relies on the existence of a public channel 
with unlimited capacity, and recent works have analyzed the role of feedback without this 
assumption. For instance, Amariucai and Wei [163], Lai, El Gamal, and Poor [164], and 
Gündüz, Brown, and Poor [165] analyzed models in which the feedback takes place over 
a noisy channel but also interferes with the eavesdropper’s observations. Ardestanizadeh, 
Franceschetti, Javidi, and Kim also analyzed a wiretap channel with rate-limited confi- 
dential feedback [34]. Note that most of these works can be viewed as special cases of the 
two-way wiretap channel. The key-exchange strategy for the two-way wiretap channel 
described in this chapter was proposed by He and Yener [150], and was combined with 
coded cooperative jamming by El Gamal, Koyluoglu, Youssef, and El Gamal [166]. All 
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these works show that, in general, strategies relying on feedback perform strictly better 
than do strategies without feedback. The secrecy capacity or secrecy-capacity region of 
channels with feedback remains elusive, although some headway has been made by He 
and Yener [167] for the two-way wiretap channel. 

The study of feedback is one facet of the more general problem of cooperation for 
secrecy. For instance, the trade-off between cooperation and security has been studied 
in the context of relay channels with confidential messages by Oohama [168], Lai and 
El Gamal [169], Yuksel and Erkip [170], and He and Yener [171, 172, 173]. Among the 
conclusions that can be drawn from these studies is the fact that relaying improves the 
end-to-end communication rate between a source and a destination even if the relay is 
to be kept ignorant of the messages. An overview of “cooperative secrecy” by Ekrem 
and Ulukus can be found in [148, Chapter 7] 

The generalization of the broadcast channel with confidential messages to multi- 
ple receivers and multiple eavesdroppers was studied by Khisti, Tchamkerten, and 
Wornell [174] as well as by Liang, Poor, and Shamai [85]. These results can be treated 
as special cases of the compound wiretap channels investigated by Liang, Kramer, Poor, 
and Shamai [175] and Bloch and Laneman [26]. Note that compound channels are rele- 
vant in practice because they offer a way of modeling the uncertainty that one might have 
about the actual channel to the eavesdropper. The generalization of the source models 
and channel models for secret-key agreement to multiple terminals has been investigated 
by Csiszar and Narayan [176, 177, 178] as well as by Nitinawarat, Barg, Narayan, Ye, 
and Reznik [179, 180]. 

In a slightly different spirit, the impact of security constraints in a network has 
been investigated by Haenggi [181] and Pinto, Barros, and Win [182] using tools from 
stochastic geometry. Secure communication in deterministic arbitrary networks has also 
been investigated by Perron, Diggavi, and Telatar [183]. 


9.1 


Network-coding security 


Many of the applications of classical coding techniques can be found at the physical 
layer of contemporary communication systems. However, coding ideas have recently 
found their way into networking research, most strikingly in the form of algebraic codes 
for networks. The existing body of work on network coding ranges from determinations 
of the fundamental limits of communication networks to the development of efficient, 
robust, and secure network-coding protocols. This chapter provides an overview of the 
field of network coding with particular emphasis on how the unique characteristics of 
network codes can be exploited to achieve high levels of security with manageable 
complexity. We survey network-coding vulnerabilities and attacks, and compare them 
with those of state-of-the-art routing algorithms. Some emphasis will be placed on 
active attacks, which can lead to severe degradation of network-coded information 
flows. Then, we show how to leverage the intrinsic properties of network coding for 
information security and secret-key distribution, in particular how to exploit the fact that 
nodes observe algebraic combinations of packets instead of the data packets themselves. 
Although the prevalent design methodology for network protocols views security as 
something of an add-on to be included after the main communication tasks have been 
addressed, we shall contend that the special characteristics of network coding warrant 
a more comprehensive approach, namely one that gives equal importance to security 
concerns. The commonalities with code constructions for physical-layer security will be 
highlighted and further investigated. 


Fundamentals of network coding 


The main concept behind network coding is that data throughput and network robustness 
can be considerably improved by allowing the intermediate nodes in a network to mix 
different data flows through algebraic combinations of multiple datagrams. This key idea, 
which clearly breaks with the ruling store-and-forward paradigm of current message- 
routing solutions, is illustrated in Figure 9.1. To exchange messages a and b, nodes A 
and B must route their packets through node S. Clearly, the traditional scheme shown 
on top would require four transmissions. However, if S is allowed to perform network 
coding with simple exclusive-or (XOR) operations, as illustrated in the lower diagram, 
a ® b can be sent in a single broadcast transmission (instead of one transmission with 
b followed by another one with a). By combining the received data with the stored 
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Figure 9.1 A typical wireless network-coding example. 


message, A, which possesses a, can recover b, and B can recover a using b. Consequently, 
in this scenario, network coding saves one transmission (thus saving energy) and one 
time slot (thus reducing the delay). More sophisticated network-coding protocols view 
packets as a collection of symbols from a particular finite field and forward linear 
combinations of these symbols across the network, thus leveraging basic features of linear 
codes such as erasure-correction capability and well-understood encoding and decoding 
algorithms. 

The resulting techniques seem particularly useful for highly volatile networks, such 
as mobile ad-hoc networks, sensor networks, and peer-to-peer communications, where 
stringent constraints due to power restrictions, limited computation capabilities, or unpre- 
dictable user dynamics can be countered by broadcasting encoded packets to multiple 
nodes simultaneously until the destination has enough degrees of freedom to decode and 
recover the original data, as illustrated by the example in Figure 9.1. 

Using information-theoretic reasoning, it is possible to prove that the multicast capac- 
ity of a network is equal to the minimum of the maximum flows between the source 
and any of the individual destinations. Most importantly, routing alone is in general not 
sufficient to achieve this fundamental limit — intermediate nodes are required to mix the 
data units they receive from their neighbors using non-trivial coding operations. 

The intuition behind this result is well illustrated by the butterfly network shown in 
Figure 9.2, where each edge is assumed to have unitary capacity. If node 1 wishes to 
send a multicast flow to sinks 6 and 7 at the max-flow min-cut bound, which in this case 
is 2, the only way to overcome the bottleneck between nodes 4 and 5 is for node 4 to 
combine the incoming symbols through an XOR operation. Sinks 6 and 7 can then use 
the symbols they receive directly from nodes 2 and 3, respectively, in order to reverse 
this XOR operation and reconstruct the desired multicast flow. 

It has also been shown that linear codes are sufficient to achieve the multicast capacity, 
yielding the algebraic framework for network coding that has since fueled a strong surge 
in network-coding research. 

Although establishing the information-theoretic limits of communication networks 
with multiple unicast or multicast sessions still seems a distant goal, there is reasonable 
evidence that network coding allows a trade-off of communication versus computational 
costs. Furthermore, network coding brings noticeable benefits in terms of throughput, 
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Figure 9.2 Canonical network-coding example. 


reliability, and fault tolerance in a variety of relevant networking scenarios beyond the 
single multicast case, including wireless broadcast and peer-to-peer systems. In particu- 
lar, random linear network coding (RLNC) provides a fully distributed methodology for 
network coding, whereby each node in the network selects independently and randomly 
a set of coefficients and uses them to form linear combinations of the data symbols (or 
packets) it receives. These linear combinations are then sent over the outgoing links 
together with the coding coefficients until the receivers are able to decode the original 
data using Gaussian elimination. 


Network-coding basics 


Network coding with XOR operations 


In the simplest form of network coding all operations are carried out in the binary field. 
In more practical terms this implies that packets are mixed (and unmixed) by means of 
straightforward XOR operations. The basic principle and the underlying advantages of 
this low-complexity approach are illustrated in Figure 9.1. Suppose that, in a wireless 
network, nodes A and B must route their packets through node S, in order to exchange 
messages a and b. S sends a @ b ina single broadcast transmission (see Figure 9.1(b)). 
By combining the received data with the stored message, A, which possesses packet a, 
can recover b, and B can recover a using b, in both cases with one XOR operation. 
Naturally, this idea can be easily extended to scenarios with more than three terminals. 
A node that has several packets in its buffer can choose judiciously how to combine 
the packets for the next broadcast transmission. This decision may depend on several 
factors, including the delay or bandwidth requirements, the available knowledge about 
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the communication channels, the network topology, and feedback about the buffer states 
of neighboring nodes. 


Linear network coding 

Moving from the binary field to larger field sizes allows the nodes in the network to 
perform more sophisticated operations on the incoming packets. In the standard network- 
coding formulation the network is represented as an acyclic directed graph G = (V, E), 
where the vertices V represent the terminals, the edges E correspond to the available 
communication links, and each data unit is represented by a symbol from a predefined 
finite field F with q = 2” for some integer m. Suppose now that node v has in its buffer 
a collection of symbols denoted X|,..., Xg and receives symbols Y;,..., Yz from its 
incoming edges. In linear network coding, the next data symbol Y, to be transmitted to a 
different node u on the outgoing edge e = (v, u) is a linear combination of the available 
symbols and can be computed according to 


K L 
a= X aX; + 5° Bi), 
i=l j=l 


where a;, 8; € F are the linear coefficients used to encode the data. Since all of the 
operations are modulo operations over a finite field, mixing packets through linear 
network coding does not increase the packet size. To recover the transmitted symbols, 
the destination node waits until it has enough independent linear combinations (or 
degrees of freedom) and then performs Gaussian elimination to solve the resulting linear 
system. 

An important property of linear network coding is that for a fixed network topology 
it is possible to find the optimal network code for a multicast session by means of a 
polynomial-time algorithm that establishes the exact operations that each node has to 
carry out. This is in sharp contrast with the multicast routing problem (without coding), 
which is well known to be NP-hard. An important consequence of the algorithm is that 
it allows us to bound the alphabet size for the network code or, equivalently, the required 
bandwidth per link. 


Random network coding 

If the network is large or highly dynamic, computing the network code in a centralized 
fashion may quickly become unfeasible. Fortunately, it is possible to perform network 
coding in a fully distributed way, by allowing the nodes to choose their linear coefficients 
independently and uniformly at random over all elements of the finite field F. This 
approach achieves the multicast capacity of the network, provided that the field size is 
sufficiently large. From a more practical point of view, random linear network coding can 
be implemented effectively in packet-oriented networks by (1) segmenting the payload 
into data symbols that can be viewed as elements of F’, (2) computing the required linear 
combinations on a symbol-by-symbol basis using random coefficients, (3) placing the 
resulting symbols on the payload of the outgoing packet, and (4) updating the header 
with the random coefficients required for decoding. 
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Table 9.1 Network coding in the layered architecture 


Layer Network-coding (NC) examples 


Application Random NC in overlay networks, distributed storage, content distribution, 
peer-to-peer applications 

Transport Flow control with special ACKs, reliable NC multicast with feedback, combined 
NC and back-pressure algorithms 


Network Deterministic NC, random NC with directed diffusion 
Link Opportunistic NC, wireless multicast with feedback 
Physical Analog NC 


Upon receiving a sufficient number of packets, each destination node can obtain the 
coding matrix from the headers of the incoming packets and recover the original data 
symbols. Notice that, due to the mixing operations, each packet loses its individuality 
in the sense that the information of a single original packet is spread over multiple 
instances. Thus, as long as there are enough degrees of freedom to decode the original 
symbols, the destination is not particularly concerned with obtaining a specific packet. 


System aspects of network coding 


Having covered some of the basic principles of network coding, we are now ready to 
discuss their application in the layered architecture of the dominant communication 
networks. With this goal in mind, we shall pursue a top-down approach, from the 
application down to the physical layer, as summarized in Table 9.1. 

The application layer is particularly well suited as a first arena for the development 
of network-coding protocols. Implementing the required software at the outer edge of 
the network means that neither the routing and medium-access protocols nor the routers 
need to be modified or replaced. The routers can be kept exactly as they are. Naturally, 
the simplified protocol design at the application layer comes at the price of discarding 
any throughput and robustness benefits that network coding may bring when applied at 
the lower layers of the protocol stack. 

On the other hand, the application layer allows us to define overlay network topologies 
that are particularly useful in conjunction with random network-coding protocols. One 
of the real-life applications that exploits these synergies is the Microsoft Secure Content 
Distribution software, also known as Avalanche. Here, each node wishing to download a 
large file contacts a number of nodes through a peer-to-peer overlay network and collects 
linear combinations of file fragments until it is able to recover the entire file. The results 
indicate that significant reductions in download time and increased robustness against 
sudden changes in the network are achieved. 

Since the random-coding coefficients have to be placed in the header of the packets, 
random network-coding protocols introduce some extra overhead, which can vary a lot 
depending on the packet size, the field size, and the number of packets that are combined 
(also called a generation). This is illustrated in Table 9.2. For internet applications, 
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Table 9.2 Network coding overhead per packet 


IP packet size Generation Overhead (AE 
(Bytes) size q=2 q=2" 
1500 20 1.3 2.7 
50 3.3 6.7 
100 6.7 13.3 
200 13.3 26.7 
5000 20 0.4 0.8 
50 1.0 2.0 
100 2.0 4.0 
200 4.0 8.0 
8192 20 0.2 0.5 
50 0.6 1.2 
100 1.2 2.4 
200 2.4 4.8 


where the IP packet size can be fairly large, the overhead is often negligible. However, in 
wireless communication networks the size of the generation and resulting overhead must 
be further restricted because packets are generally smaller. It is also worth mentioning 
that in error-prone channels the header requires extra protection by means of error- 
correction codes, which further increase the overhead incurred. 

In distributed-storage applications with unreliable storage elements, random network 
coding increases data persistence by spreading linear combinations in different locations 
of the network. As long as there exist enough degrees of freedom stored in the network, 
the data can be decoded even in highly volatile environments. 

The facts that the receiver has to wait until it receives enough packets to decode 
the data and that the decoding algorithm itself has non-negligible complexity (O(n?) 
for generation size n) have aroused some skepticism as to whether network coding 
can be used effectively in real-time applications, e.g. video streaming in peer-to-peer 
networks. Nevertheless, recent work on combining network coding with a randomized 
push algorithm shows promising results. 

At the transport layer, the Transmission Control Protocol (TCP) typically fulfills two 
different tasks: (a) it guarantees reliable and in-order delivery of the transmitted data 
from one end of the network to its final destination by means of retransmissions, and (b) 
it implements flow and congestion control by monitoring the network state, as well as the 
buffer levels at the receiver. To fulfill its purpose, TCP relies on the acknowledgments 
sent by the receiver to decide which packets must be retransmitted and to estimate vital 
metrics pertaining to the round-trip delay and the congestion level of the network. 

In the case of network-coding protocols, the transmitted packets are linear combi- 
nations of different packets within one generation, and therefore the acknowledgments 
sent by the receiving end may provide a very different kind of feedback. More specif- 
ically, instead of acknowledging specific packets, each destination node can send back 
requests for degrees of freedom that increase the dimension of its vector space and 
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allow faster decoding. Upon receiving acknowledgments from the different receivers, 
the source node minimizes the delay by sending the most innovative linear combination, 
i.e. the one that is useful to most destination nodes. Without network coding, end-to-end 
reliability for multicast sessions with delay constraints is generally perceived as a very 
difficult problem. 

Random network coding seems to perform well in combination with back-pressure 
techniques, which analyze the differences in queue size between the two ends of a link 
in order to implement distributed flow control and thus maintain the stability of the 
network. 

The single path or route that lies at the core of common routing algorithms is arguably 
less clear when network coding is involved, because network coding seems most effective 
when there are multiple paths carrying information from the source to the destination. 
One way to proceed is to combine network coding with directed diffusion techniques, 
whereby nodes spread messages of interest in order to route the right data to the right 
nodes. Even if one opts to use a standard routing algorithm to forward the data, the 
topology-discovery phase, during which nodes broadcast link-state advertisements to 
all other nodes, is likely to benefit from the throughput gains of network-coding-based 
flooding protocols. 

At the link layer, opportunistic network coding emerges as a promising technology, 
which has been implemented successfully in a real wireless mesh network. The main 
idea is that nodes in a wireless network can learn a lot about the flow of information 
to neighboring nodes simply by keeping their radios in promiscuous mode during idle 
times. On the basis of this acquired knowledge and the buffer state information sent 
by neighboring nodes, it is possible to solve a local multicast problem in the one-hop 
neighborhood of any given node. The XOR combination of queued packets that is useful 
to most neighbors can then be broadcast, thus minimizing the number of transmissions 
and saving valuable bandwidth and power. To reap the largest possible benefits from this 
approach, the medium-access protocols and scheduling algorithms at the link layer must 
be redesigned to allow opportunistic transmission. 

Network-coding ideas have recently found their way to the physical layer by means of 
a simple communication scheme whereby two nodes send their signals simultaneously 
to a common relay over a wireless channel. The transmitted signals interfere in an 
additive way and the relay amplifies and broadcasts back the signal it received via its 
own antenna. This form of analog network coding, which had already been implemented 
using software-defined radios, further reduces the number of transmissions required for 
the nodes in Figure 9.1 to communicate effectively with each other. There is a strong rela- 
tionship between the resulting schemes and the relay methods that have been proposed 
in the area of cooperative transmission among multiple receivers (see also Chapter 8). 


Practical network-coding protocols 


Although network coding is a fairly recent technology, there exist already protocol 
proposals regarding the use of network coding for higher throughput or robustness in 
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Figure 9.3 An instance of random linear network coding. 


various applications and communication networks. From the point of view of network 
security, it is useful to divide network-coding protocols into two main classes: 


(1) stateless network-coding protocols, which do not rely on any form of state informa- 
tion to decide when and how to mix different packets in the sender queue; 

(2) state-aware network-coding protocols, which rely on partial or full network state 
information (for instance buffer states of neighboring nodes, network topology, or 
link costs) to compute a network code or determine opportunities to perform network 
coding in a dynamic fashion. 


As will be demonstrated in Section 9.5, the security vulnerabilities of the protocols 
in the first and second classes are quite different from each other, most notably because 
the former require state information and node identification to be disseminated in the 
network and are thus vulnerable to a wide range of impersonation and control-traffic 
attacks. 


Stateless network-coding protocols 

As mentioned earlier, random linear network coding (RLNC) is a completely distributed 
methodology for combining different information flows and therefore leads to stateless 
protocol design. The basic principle is that each node in the network selects a set of 
coefficients independently and randomly and then sends linear combinations of the data 
symbols (or packets) it receives. Figure 9.3 illustrates the linear operations carried out 
at intermediate node v (using integers for simplicity). The symbols x, y, and z denote 
the native packets, which convey the information to be obtained at the receivers via 
Gaussian elimination. P1 and P2 arrive at intermediate node v in the network through 
its incoming links. P3, which is sent through the only outgoing link of node v, is the 
result of a random linear combination of P1 and P2 at node v, with chosen coefficients 
1 and 2, respectively. 
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Recall that the global encoding vector, i.e. the matrix of coefficients used to encode 
the packets, is sent along in the packet header to ensure that the end receivers are 
capable of decoding the original data. Specifically, it was shown that, if the coefficients 
are chosen at random from a large enough field, then Gaussian elimination succeeds with 
overwhelming probability. RLNC can also be used in asynchronous packet networks, 
and it has been shown that it is capacity-achieving even on lossy packet networks. 

A framework for packetized network coding (Practical Network Coding, PNC) lever- 
ages RLNC’s resilience against disruptions such as packet loss, congestion, and changes 
of topology, in order to guarantee robust communication over highly dynamic networks 
with minimal (or no) control information. The framework defines a packet format and a 
buffering model. The packet format includes in its header the global encoding vector, 
which is the set of linear transformations that the original packet goes through on its path 
from the source to the destination. The payload of the packets is divided into vectors 
according to the field size (2° or 2!°, i.e. each symbol has 8 or 16 bits, respectively). Each 
of these symbols is then used as a building block for the linear operations performed by 
the nodes. 

The buffering model divides the stream of packets into generations of size h, such that 
packets in the same generation are tagged with a common generation number. Each node 
sorts the incoming packets in a single buffer according to their generation number. When 
there is a transmission opportunity at an outgoing edge, the sending node generates a 
new packet, which contains a random linear combination of all packets in the buffer that 
belong to the current generation. If a packet is non-innovative, i.e. if it does not increase 
the rank of the decoding matrix available at the receiving node, then it is immediately 
discarded. As soon as the matrix of received packets has full rank, Gaussian elimination 
is performed at the receivers to recover the original packets. 

RLNC seems particularly beneficial in dynamic and unstable networks — that is, 
networks in which the structure or topology of the network varies within a short time, 
such as mobile ad-hoc networks and peer-to-peer content-distribution networks. The 
benefits of RLNC in wireless environments with rare and limited connectivity, either 
due to mobility or battery scarcity, suggest an algorithm aimed at reducing the overhead 
of probabilistic routing algorithms with applications in delay-tolerant networks. 

The potential impact of RLNC in content-distribution networks can be analyzed as 
follows. Since each node forwards a random linear combination independently of the 
information present at other nodes, its operation is completely desynchronized and 
locally based. Moreover, when collecting a random combination of packets from a ran- 
domly chosen node, there is a high probability of obtaining a linearly independent packet 
in each time slot. Thus, the problem of redundant transmissions, which is typical of tradi- 
tional flooding approaches, is considerably reduced. In content-distribution applications, 
there is no need to download one particular fragment. Instead, any linearly independent 
segment brings innovative information. A possible practical scheme for large files allows 
nodes to make forwarding decisions solely on the basis of local information. It has been 
shown that network coding not only improves the expected file-download time, but also 
improves the overall robustness of the system with respect to frequent changes in the 
network. 
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The performance of network coding has been compared with that of traditional coding 
measures in a distributed-storage setting in which each storage location has only very 
limited storage space for each file and the objective is that a file-downloader connects 
to as few storage locations as possible to retrieve a file. It was shown that RLNC 
performs well without any need for a large amount of additional storage space at a 
centralized server. There exists also a general graph-theoretic framework for computing 
lower bounds on the bandwidth required to maintain distributed-storage architectures. It 
is now known that RLNC achieves these lower bounds. 


State-aware network-coding protocols 

State-aware network-coding protocols rely on the available state information to optimize 
the coding operations carried out by each node. The optimization process may target 
the throughput or the delay, among other performance metrics. Its scope can be local 
or global, depending on whether the optimization affects only the operations within the 
close neighborhood of a node or addresses the end-to-end communication across the 
entire network. If control traffic is exchanged between neighbors, each node can perform 
a local optimization step to decide on-the-fly how to mix and transmit the received data 
packets. One way to implement protocols with global scope is to use the polynomial- 
time algorithm on a given network graph, which, given a network graph, can be used to 
determine the optimal network code before starting the communication. By exploiting 
the broadcast nature of the wireless medium and spreading encoded information in 
a controlled manner, state-aware protocols promise considerable advantages in terms 
of throughput, as well as resilience with respect to node failures and packet losses. 
Efficiency gains come mainly from the fact that nodes make use of every data packet 
they overhear. In some instances network coding reduces the required amount of control 
information. 

As an example, the COPE protocol inserts a coding layer between the IP and MAC 
layers for detecting coding opportunities. More specifically, nodes overhear and store 
packets that are exchanged within their radio range, and then send reception reports to 
inform their neighbors about which packets they have stored in their buffers. On the basis 
of these updates, each node computes the optimal XOR mixture of multiple packets in 
order to reduce the number of transmissions. It has been shown that this approach can 
lead to strong improvements in terms of throughput and robustness to network dynamics. 


Security vulnerabilities 


The aforementioned network-coding protocols expect a well-behaved node to do the 
following: 


e encode the received packets correctly, thus contributing to the expected benefits of 
network coding; 

e forward the encoded packets correctly, thus enabling the destination nodes to retrieve 
the intended information; 
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e ignore data for which it is not the intended destination, thus fulfilling basic confiden- 
tiality requirements. 


In the case of state-aware network-coding protocols, we must add one more rule: 


e the node must participate in the timely dissemination of correct state information, 
thus contributing to a sound knowledge base for network-coding decisions. 


It follows that an attack on a network-coding-based protocol must result from one or 
more network nodes breaking one or more of these basic rules. Naturally, the means 
to achieve a successful attack are, of course, highly dependent on the specific rules of 
the protocol, and therefore it is reasonable to distinguish between the aforementioned 
stateless and state-aware classes of network-coding schemes. 

If properly applied, stateless network protocols based on RNLC are potentially less 
subject to some of the typical security issues of traditional routing protocols. First, 
stateless protocols do not depend on exchange of topology or buffer-state information, 
which can be faked (e.g. through link-spoofing attacks). Second, the impact of traffic- 
relay refusal is reduced, due to the inherent robustness that results from spreading the 
information by means of network coding. Third, the information retrieval depends solely 
on the data received and not on the identity of nodes, which ensures some protection 
against impersonation attacks. 

In contrast, state-aware network-coding protocols rely on vulnerable control informa- 
tion disseminated among nodes in order to optimize the encodings. On the one hand, 
this property renders them particularly prone to attacks based on the generation of false 
control information. On the other hand, control-traffic information can also be used 
effectively against active attacks, such as the injection of erroneous packets. For pro- 
tocols with local scope, the negative impact of active attacks is limited to a confined 
neighborhood, whereas with end-to-end network coding the consequences can be much 
more devastating. Opportunistic network-coding protocols, which rely on the informa- 
tion overheard by neighboring nodes over the wireless medium, are obviously more 
amenable to eavesdropping attacks than are their wireline counterparts. 

More generally, in comparison with traditional routing, the damage caused by a 
malicious (Byzantine) node injecting corrupted packets into the information flow is 
likely to be higher with network coding, irrespective of whether a protocol is stateless or 
state-aware. Since network coding relies on mixing the content of multiple data packets, 
a single corrupted packet may very easily corrupt the entire information flow from the 
sender to the destination at any given time. 


Securing network coding against passive attacks 


Having discussed the specific vulnerabilities of network coding, we now turn our atten- 
tion to the next natural question, namely how to find appropriate mechanisms for securing 
network-coding protocols. Our main goal here is to show how the specific character- 
istics of network coding can be leveraged to counter some of the threats posed by 
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Figure 9.4 Example of algebraic security. The top scheme discloses data to intermediate nodes, 
whereas the bottom scheme can be deemed algebraically secure. 


eavesdroppers and Byzantine attackers. We shall start by presenting countermeasures 
against passive attacks, with special emphasis on three different scenarios. First, we shall 
consider nice but curious nodes, which do not break any of the established rules except 
for not ignoring the data for which they are not the intended receivers. In the second 
instance, the eavesdropper is able to wiretap a subset of network links. Finally, the third 
type of attacker is a worst-case eavesdropper who is given full access to all the traffic in 
the network. 


Nice but curious nodes 

Consider first a threat model in which the network consists entirely of nice but curious 
nodes, which comply with the communication protocols (in that sense, they are well 
behaved), but may try to acquire as much information as possible from the data flows 
that pass through them (in which case, they are potentially ill-intentioned). Under this 
scenario, stateless protocols that exploit the RLNC scheme described in Section 9.4 
possess an intrinsic security feature: depending on the size of the code alphabet and the 
topology of the network, it is in many instances unlikely that an intermediate node will 
have enough degrees of freedom to perform Gaussian elimination and gain access to the 
transmitted data set. 

On the basis of this observation, it is possible to characterize the threat level posed by 
an intermediate node according to an algebraic security criterion that takes into account 
the number of components of the global encoding vector it receives. In the example 
of Figure 9.4, which uses integers for simplicity, the upper (uncoded) transmission 
scheme leaves partial data unprotected, whereas in the lower (network-coding) scheme 
the intermediate nodes 2 and 3 are not able to recover the data symbols. 
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Wiretappers with access to some links 
A different threat model, which is commonly found in the recent literature on secure 
network coding, assumes that one or more external eavesdroppers (or wiretappers) have 
access to a subset of the available communication links. The crux of the problem is then 
the need to find code constructions capable of splitting the data among different links in 
such a way that reconstruction by the attackers is either very difficult or impossible. Under 
this assumption, there exist secure linear network codes that achieve perfect information- 
theoretic secrecy for single-source multicast transmission. The corresponding secure 
network-coding problem can in fact be cast as a variant of the so-called wiretap channel 
type-II problem. There, the eavesdropper is able to select and access u symbols of the 
n coded symbols transmitted by the legitimate sender. It can be shown that in that case 
a maximum of k = n — u information symbols can be transmitted in perfect secrecy. 
These results can be generalized to multi-source linear network codes by using the 
algebraic structure of such codes to derive necessary and sufficient conditions for their 
security. More specifically, it has been shown that the code constructed maximizes 
the amount of secure multicast information while minimizing the necessary amount 
of randomness. Such a coding scheme can achieve a maximum possible rate of n — u 
information-theoretically secure packets, where n is now the number of packets sent from 
the source to each receiver and u denotes the number of links that the wiretapper can 
observe. This can be applied on top of any communication network without requiring any 
knowledge of the underlying network code and without imposing any coding constraints. 
The basic idea is to use a “nonlinear” outer code, which is linear over an extension field 
Fj», and to exploit the benefits of this extension field. Some contributions propose 
a different criterion whereby a system is deemed secure if an eavesdropper is unable 
to get any uncoded or immediately decodable (also called meaningful) source data. 
Other contributions exploit the network topology to ensure that an attacker is unable to 
get any meaningful information and add a cost function to the secure network-coding 
problem. The problem then becomes finding a coding scheme that minimizes both the 
network cost and the probability that the attacker is able to retrieve all the messages of 
interest. 

Since state-aware network-coding protocols with local code optimization (e.g. COPE) 
expect neighboring nodes to be able to decode all the packets they receive, confidentiality 
must be ensured by means of end-to-end encryption. 


Worst-case eavesdroppers 

In this case, the threat model is one in which the attacker has access to all the packets 
traversing the network but not to the secret keys shared among legitimate communicat- 
ing parties. Secure Practical Network Coding (abbreviated as SPOC, not SPNC) is a 
lightweight security scheme for confidentiality in RLNC, which provides a simple yet 
powerful way to exploit the inherent security of RLNC in order to reduce the number 
of cryptographic operations required for confidential communication. This is achieved 
by protecting (or “locking’’) only the source coefficients required in order to decode the 
linearly encoded data, while allowing intermediate nodes to run their network-coding 
operations on substitute “unlocked” coefficients that provably do not compromise the 
hidden data. 
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To evaluate the level of security provided by SPOC, one analyzes the mutual infor- 
mation between the encoded data and the two components that can lead to information 
leakage, namely the matrices of random coefficients and the original data. This analysis, 
which is independent of any particular cipher used for locking the coefficients, assumes 
that the encoding matrices are based on variants of RLNC and can be accessed only by 
the source and sinks. The results, some of which hold even with finite block lengths, 
prove that information-theoretic security is achievable for any field size without loss in 
terms of decoding probability. In other words, since correlation attacks based on the 
encoded data become impossible, protecting the encoding matrix is generally sufficient 
to ensure the confidentiality of network-coded data. 


Countering Byzantine attacks 


Although Byzantine attacks can have a severe impact on the integrity of network-coded 
information, the specific properties of linear network codes can be used effectively 
to counteract the impairments caused by traffic-relay refusal or injection of erroneous 
packets. In particular, RLNC has been shown to be very robust with respect to packet 
losses induced by node misbehavior. More sophisticated countermeasures, which modify 
the format of coded packets, can be subdivided into two main categories: (1) end-to-end 
error correction and (2) misbehavior detection, which can be carried out either packet 
by packet or in generation-based fashion. 


End-to-end error correction 
The main advantage of end-to-end error-correcting codes is that the burden of applying 
error-control techniques is left entirely to the source and the destinations, such that 
intermediate nodes are not required to change their mode of operation. The typical 
transmission model for end-to-end network coding is well described by a matrix channel 
Y = AX + Z, where X corresponds to the matrix whose rows are the transmitted packets, 
Y is the matrix whose rows are the received packets, Z denotes the matrix corresponding 
to the injected error packets after propagation over the network, and A describes the 
transfer matrix, which corresponds to the global linear transformation performed on 
packets as they traverse the network. In terms of performance, error-correction schemes 
can correct up to the min-cut between the source and the destinations. Rank-metric error- 
correcting codes in RLNC under this setting appear to work well, including in scenarios in 
which the channel may supply partial information about erasures and deviations from the 
sent information flow. Still under the same setting, a probabilistic error model for random 
network coding provides bounds on capacity and presents a simple coding scheme with 
polynomial complexity that achieves capacity with an exponentially low probability of 
failure with respect to both the packet length and the field size. Bounds on the maximum 
achievable rate in an adversarial setting can be obtained from generalizations of the 
Hamming and Gilbert—Varshamov bounds. 

A somewhat different approach to network error correction consists of robust network 
codes that have polynomial-time complexity and attain optimal rates in the presence of 
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active attacks. The basic idea is to regard the packets injected by an adversarial node as 
a second source of information and add enough redundancy to allow the destination to 
distinguish between relevant and erroneous packets. The capacity achieved depends on 
the rate at which the attacker can obtain information, as well as on the existence of a 
shared secret between the source and the sinks. 


Misbehavior detection 

Generation-based detection schemes generally offer similar advantages to those obtained 
with network error-correcting codes in that the often computationally expensive task of 
detecting the modifications introduced by Byzantine attackers is carried out by the 
destination nodes. The main disadvantage of generation-based detection schemes is 
that only nodes with enough packets from a generation are able to detect malicious 
modifications, and thus usage of such detection schemes can result in large end-to-end 
delays. The underlying assumption is that the attacker cannot see the full rank of the 
packets in the network. It has been shown that a hash scheme with polynomial complexity 
can be used without the need for secret-key distribution. However, the use of a block 
code forces an a-priori decision on the coding rate. 

The key idea of packet-based detection schemes is that some of the intermediate 
nodes in the network can detect polluted data on-the-fly and drop the corresponding 
packets, thus retransmitting only valid data. However, packet-based detection schemes 
require active participation of intermediate nodes and are dependent on hash functions, 
which are generally computationally expensive. Alternatively, this type of attack can 
be mitigated by signature schemes based on homomorphic hash functions. The use of 
homomorphic hash functions is specifically tailored for network-coding schemes, since 
the hash of a coded packet can easily be derived from the hashes of previously encoded 
packets, thus enabling intermediate nodes to verify the validity of encoded packets 
prior to mixing them algebraically. Unfortunately, homomorphic hash functions are also 
computationally expensive. 

There exists a homomorphic signature scheme for network coding that is based on 
Weil pairing in elliptic-curve cryptography. Homomorphic hash functions have also been 
considered in the context of peer-to-peer content distribution with rateless erasure codes 
for multicast transfers. With the goal of preventing both the waste of large amounts 
of bandwidth and the pollution of download caches of network clients, each file is 
compressed to a smaller hash value, with which receivers can check the integrity of 
downloaded blocks. Beyond its independence from the coding rate, the main advantage 
of this process is that it is less computationally expensive for large files than are traditional 
forward error-correction codes (such as Reed-Solomon codes). 

A cooperative security scheme can be used for on-the-fly detection of malicious 
blocks injected in network coding-based peer-to-peer networks. In order to reduce the 
cost of verifying information on-the-fly while efficiently preventing the propagation of 
malicious blocks, the authors propose a distributed mechanism whereby every node per- 
forms block checks with a certain probability and alerts its neighbors when a suspicious 
block is found. Techniques to prevent denial-of-service attacks due to the dissemination 
of alarms are available. 
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The basic idea is to take advantage of the fact that in linear network coding any 
valid packet transmitted belongs to the subspace spanned by the original set of vectors. 
A signature scheme is thus used to check that a given packet belongs to the original 
subspace. Generating a signature that is not in the subspace yet passes the check has 
been shown to be hard. 

A comparison of the bandwidth overhead required by Byzantine error-correction 
and -detection schemes can be carried out as follows. The intermediate nodes are divided 
into regular nodes and trusted nodes, and only the latter are given access to the public 
key of the Byzantine detection scheme in use. Under these assumptions, it is shown 
that packet-based detection is most competitive when the probability of attack is high, 
whereas a generation-based approach is more bandwidth-efficient when the probability 
of attack is low. 


Key-distribution schemes 

The ability to distribute secret keys in a secure manner is an obvious fundamental 
requirement towards assuring cryptographic security. In the case of highly constrained 
mobile ad-hoc and sensor networks, key pre-distribution schemes emerge as a strong 
candidate, mainly because they require considerably less computation and communi- 
cation resources than do trusted-party schemes or public-key infrastructures. The main 
caveat is that secure connectivity can be achieved only in probabilistic terms, i.e. if each 
node is loaded with a sufficiently large number of keys drawn at random from a fixed 
pool, then with high probability it will share at least one key with each neighboring 
node. 

It has been shown that network coding can be an effective tool for establishing 
secure connections between low-complexity sensor nodes. In contrast with pure key- 
pre-distribution schemes, it is assumed that a mobile node, e.g. a hand-held device or 
a laptop computer, is available for activating the network and for helping to establish 
secure connections between nodes. By exploiting the benefits of network coding, it is 
possible to design a secret-key-distribution scheme that requires only a small number of 
pre-stored keys, yet ensures that shared-key connectivity is established with a probability 
of unity and that the mobile node is provably oblivious to the distributed keys. 

The basic idea of the protocol, which is illustrated in Figure 9.5, can be summarized 
in the following tasks: 


(a) prior to sensor-node deployment: 
(1) a large pool P of N keys and their N identifiers are generated off-line; 
(2) a different subset of L keys drawn randomly from P and the corresponding L 
identifiers are loaded into the memory of each sensor node; 
(3) a table is constructed with the N key identifiers and N sequences that result 
from performing an XOR of each key with a common protection sequence X; 
(4) the table is stored in the memory of the mobile node; 
(b) after sensor-node deployment: 
(1) the mobile node broadcasts HELLO messages that are received by any sensor 
node within wireless transmission range; 
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Figure 9.5 Secret-key-distribution scheme. Sensor nodes A and B want to exchange two keys via a 
mobile node S. The process is initiated by a HELLO message broadcast by S. Upon receiving this 
message, each sensor node sends back a key identifier i(-) corresponding to one of its keys Kj,.). 
Node S then broadcasts the result of the XOR of the two keys, Kia) ® Kig. Once this process 
is concluded, sensor nodes A and B can communicate using the two keys Ka) and K;œ) (one in 
each direction). Here, Exa (ma-s) denotes a message sent by A to B, encrypted with Ka), and 
E Kk;œ (MBa) corresponds to a message sent by B to A, encrypted with Kip). 


(2) each sensor node replies with a key identifier; 

(3) on the basis of the received key identifiers the mobile node locates the cor- 
responding sequences protected by X and combines them through an XOR 
network-coding operation, thus canceling out X and obtaining the XOR of the 
corresponding keys; 

(4) the mobile node broadcasts the resulting XOR sequence; 

(5) by combining the received XOR sequence with its own key, each node can 
easily recover the key of its neighbor, thus sharing a pair of keys that is kept 
secret from the mobile node. 


Although the use of network coding hereby presented is limited to XOR operations, 
more powerful secret-key-distribution schemes are likely to result from using linear 
combinations of the stored keys. 


Bibliographical notes 


The field of network coding emerged from the seminal work of Ahlswede, Cai, Li, and 
Yeung [184], who proved that the multicast capacity of a general network can be achieved 
only if intermediate nodes are allowed to encode incoming symbols. Li, Yeung, and Cai 
proved that linear codes [185] are sufficient to achieve the aforementioned multicast 
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capacity. The algebraic framework for network coding developed by Koetter and Médard 
in [186] and the development of random linear network coding by Ho et al. in [187] led to 
practical applications in which the nodes in the network generate linear combinations of 
information symbols using random coefficients. A practical approach to random linear 
network coding was proposed and tested by Chou, Wu, and Jain in [188]. Microsoft 
presented the first application of network coding for content distribution in a paper 
by Gkantsidis and Rodriguez [189]. A system implementation of XOR-based network 
coding for wireless networks (the COPE protocol) was presented by Katti et al. in [190]. 
Secure network coding with wiretapping limited to a subset of the network links was first 
addressed by Cai and Yeung in [191]. The connection between secure network coding and 
the wiretap channel of type II of Ozarow and Wyner [192] was investigated by Rouayheb 
and Soljanin in [193]. A weak criterion for the algebraic security provided by network 
coding was presented by Bhattad and Narayanan in [194]. The security potential of the 
algebraic structure of network coding in large-scale networks was analyzed by Lima, 
Médard, and Barros in [195]. Vilela, Lima and Barros provided in [196] a lightweight 
security solution for network coding, whereby only the coding coefficients must be 
encrypted. The proposed scheme found its first application in wireless video [197]. 
Solutions for Byzantine attacks were presented by Jaggi et al. in [198] and by Gkantsidis 
and Rodriguez in [199]. An information-theoretic treatment of network error correction 
was provided by Cai and Yeung in [200]. Koetter and Kschischang later presented 
code constructions for network error correction in [201]. Kim et al. provided a unified 
treatment of robust network coding for peer-to-peer networks under Byzantine attacks 
in [202]. Oliveira, Costa, and Barros investigated the properties of wireless network 
coding for secret-key distribution in [203]. Kim, Barros, Médard, and Koetter showed 
how to detect misbehaving nodes in wireless networks with network coding using an 
algebraic watchdog in [204]. A detailed account of the many facets of network coding 
can be found in the books by Ho and Lun [205] and by Fragouli and Soljanin [206, 207]. 


References 


[1] C. E. Shannon, “Communication theory of secrecy systems,” Bell System Technical Journal, 
vol. 28, no. 4, pp. 656-715, April 1949. 

[2] R. G. Gallager, Information Theory and Reliable Communication. Wiley, 1968. 

[3] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd edn. Wiley-Interscience, 
2006. 

[4] R. W. Yeung, A First Course in Information Theory. Springer, 2002. 

[5] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless 
Systems. Akadémiai Kiadó, 1981. 

[6] G. Kramer, Topics in Multi-User Information Theory. NOW Publishers, 2008. 

[7] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, 
vol. 27, nos. 7/10, pp. 379-423/623-656, July/October 1948. 

[8] G. Kramer, “Capacity results for the discrete memoryless networks,” IEEE Transactions on 
Information Theory, vol. 49, no. 1, pp. 4-21, January 2003. 

[9] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE 
Transactions on Information Theory, vol. 19, no. 4, pp. 471-480, July 1973. 

[10] M. H. M. Costa, “Writing on dirty paper,’ IEEE Transactions on Information Theory, 
vol. 29, no. 3, pp. 439-441, May 1983. 

[11] A. Wyner and J. Ziv, “The rate-distortion function for source coding with side information 
at the decoder,” IEEE Transactions on Information Theory, vol. 22, no. 1, pp. 1-10, January 
1976. 

[12] R. Ahlswede, “Multi-way communication channels,” in Proc. International Symposium on 
Information Theory, Thakadsor, Armenian SSR, USSR, September 1971, pp. 23-52. 

[13] H. Liao, “Multiple access channels,” Ph.D. dissertation, University of Hawaii, 1972. 

[14] T. M. Cover, “Broadcast channels,’ IEEE Transactions on Information Theory, vol. 18, 
no. 1, pp. 2-14, January 1972. 

[15] P. Bergmans, “Random coding theorem for broadcast channels with degraded components,” 
IEEE Transactions on Information Theory, vol. 19, no. 2, pp. 197-207, March 1973. 

[16] R. G. Gallager, “Capacity and coding for degraded broadcast channels,’ Problemy Peredachi 
Informatsii, vol. 10, no. 3, pp. 3-14, 1974. 

[17] T. M. Cover, “Comments on broadcast channels,” IEEE Transactions on Information Theory, 
vol. 44, no. 6, pp. 2524-2530, October 1998. 

[18] I. Csiszár and J. Körner, “Broadcast channels with confidential messages,” IEEE Transac- 
tions on Information Theory, vol. 24, no. 3, pp. 339-348, May 1978. 

[19] G. S. Vernam, “Cipher printing telegraph systems for secret wire and radio telegraphic 
communications,” Transactions of the American Institute of Electrical Engineers, vol. 45, 
no. 1, pp. 295-301, January 1926. 


312 


References 


[20] G. D. Forney, Jr., “On the role of MMSE estimation in approaching the information-theoretic 
limits of linear Gaussian channels: Shannon meets Wiener,’ in Proc. 41st Annual Allerton 
Conference on Communication, Control, and Computing, Monticello, IL, USA, October 
2003, pp. 430-439. 

[21] A. D. Wyner, “The wire-tap channel,” Bell System Technical Journal, vol. 54, no. 8, 
pp. 1355-1367, October 1975. 

[22] S. Leung-Yan-Cheong, “On a special class of wiretap channels,’ IEEE Transactions on 
Information Theory, vol. 23, no. 5, pp. 625—627, September 1977. 

[23] J. L. Massey, “A simplified treatment of Wyner’s wire-tap channels,” in Proc. 21st Allerton 
Conference on Communication, Control, and Computing, Monticello, IL, USA, October 
1983, pp. 268-276. 

[24] I. Csiszár, “Almost independence and secrecy capacity,” Problemy Peredachi Informatsii, 
vol. 32, no. 1, pp. 40-47, January—March 1996. 

[25] M. Hayashi, “General nonasymptotic and asymptotic formulas in channel resolvability and 
identification capacity and their application to the wiretap channels,” JEEE Transactions on 
Information Theory, vol. 52, no. 4, pp. 1562-1575, April 2006. 

[26] M. Bloch and J. N. Laneman, “On the secrecy capacity of arbitrary wiretap channels,” in 
Proc. 46th Allerton Conference on Communication, Control, and Computing, Monticello, 
IL, USA, September 2008, pp. 818-825. 

[27] B. P. Dunn, M. Bloch, and J. N. Laneman, “Secure bits through queues,” in Proc. IEEE 
Information Theory Workshop on Networking and Information Theory, Volos, Greece, June 
2009, pp. 37-41. 

[28] J. Körner and K. Marton, “Comparison of two noisy channels,” in Proc. Topics in Informa- 
tion Theory, Keszthely, Hungary, 1977, pp. 411-423. 

[29] M. van Dijk, “On a special class of broadcast channels with confidential messages,” IEEE 
Transactions on Information Theory, vol. 43, no. 2, pp. 712-714, March 1997. 

[30] C. Nair, “Capacity regions of two new classes of 2-receiver broadcast channels,” in Proc. 
IEEE International Symposium on Information Theory, Seoul, South Korea, July 2009, 
pp. 1839-1843. 

[31] H. Yamamoto, “Rate-distortion theory for the Shannon cipher systems,” IEEE Transactions 
on Information Theory, vol. 43, no. 3, pp. 827-835, May 1997. 

[32] S. K. Leung-Yan-Cheong, “Multi-user and wiretap channels including feedback,” Ph.D. 
dissertation, Stanford University, 1976. 

[33] R. Ahlswede and N. Cai, “Transmission, identification and common randomness capaci- 
ties for wire-tap channels with secure feedback from the decoder,” in General Theory of 
Information Transfer and Combinatorics. Springer-Verlag, 2006, pp. 258-275. 

[34] E. Ardestanizadeh, M. Franceschetti, T. Javidi, and Y.-H. Kim, “Wiretap channel with 
secure rate-limited feedback,” IEEE Transactions on Information Theory, vol. 55, no. 12, 
pp. 5353-5361, December 2009. 

[35] M. Feder and N. Merhav, “Relations between entropy and error probability,’ IEEE Trans- 
actions on Information Theory, vol. 40, no. 1, pp. 259-266, January 1994. 

[36] K. Yasui, T. Suko, and T. Matsushima, “An algorithm for computing the secrecy capacity 
of broadcast channels with confidential messages,” in Proc. IEEE International Symposium 
on Information Theory, Nice, France, July 2007, pp. 936-940. 

[37] K. R. Gowtham and A. Thangaraj, “Computation of secrecy capacity for more-capable 
channel pairs,” in Proc. IEEE International Symposium on Information Theory, Toronto, 
Canada, July 2008, pp. 529-533. 


References 313 


[38] G. Van Assche, Quantum Cryptography and Secret-Key Distillation. Cambridge University 
Press, 2006. 

[39] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information. Cam- 
bridge University Press, 2000. 

[40] N. Gisin, G. Ribordy, W. Tittel, and H. Zbinden, “Quantum cryptography,” Reviews of 
Modern Physics, vol. 74, no. 1, pp. 145-195, January 2002. 

[41] U. Maurer, R. Renner, and S. Wolf, “Unbreakable keys from random noise,” in Security with 
Noisy Data, P. Tuyls, B. Skoric, and T. Kevenaar, Eds. Springer-Verlag, 2007, pp. 21-44. 

[42] U. Maurer, “Secret key agreement by public discussion from common information,” JEEE 
Transactions on Information Theory, vol. 39, no. 3, pp. 733-742, May 1993. 

[43] R. Ahlswede and I. Csiszár, “Common randomness in information theory and cryptography. 
I. Secret sharing,” IEEE Transactions on Information Theory, vol. 39, no. 4, pp. 1121-1132, 
July 1993. 

[44] A. A. Gohari and V. Anantharam, “Information-theoretic key agreement of multiple termi- 
nals — part I,’ IEEE Transactions on Information Theory, vol. 56, no. 8, pp. 3973-3996, 
August 2010. 

[45] A. A. Gohari and V. Anatharam, “Information-theoretic key agreement of multiple termi- 
nals — part II: Channel models,” IEEE Transactions on Information Theory, vol. 56, no. 8, 
pp. 3997-4010, August 2010. 

[46] U. M. Maurer and S. Wolf, “Unconditionally secure key agreement and intrinsic conditional 
information,” IEEE Transactions on Information Theory, vol. 45, no. 2, pp. 499-514, March 
1999. 

[47] R. Renner and S. Wolf, “New bounds in secret-key agreement: The gap between formation 
and secrecy extraction,” in Proc. EUROCRYPT, 2003, pp. 652-577. 

[48] I. Csiszár and P. Narayan, “Common randomness and secret key generation with a helper,” 
IEEE Transactions on Information Theory, vol. 46, no. 2, pp. 344-366, March 2000. 

[49] A. Khisti, S. N. Diggavi, and G. Wornell, “Secret-key generation with correlated sources and 
noisy channels,” in Proc. IEEE International Symposium on Information Theory, Toronto, 
Canada, July 2008, pp. 1005-1009. 

[50] V. Prabhakaran and K. Ramchandran, “A separation result for secure communication,” in 
Proc. 45th Allerton Conference on Communications, Control and Computing, Monticello, 
IL, USA, September 2007, pp. 34—41. 

[51] S. Watanabe and Y. Oohama, “Secret key agreement from vector Gaussian sources by rate 
limited public communications,” in Proc. IEEE International Symposium on Information 
Theory, Austin, TX, USA, June 2010, pp. 2597—2601; see also arXiv:1001.3705v1. 

[52] M. J. Gander and U. M. Maurer, “On the secret-key rate of binary random variables,” in 
Proc. IEEE International Symposium on Information Theory, Trondheim, Norway, June 
1994, p. 351. 

[53] S. Liu, H. C. A. Van Tilborg, and M. Van Dijk, “A practical protocol for advantage distillation 
and information reconciliation,” Designs, Codes and Cryptography, vol. 30, no. 1, pp. 39- 
62, August 2003. 

[54] M. Naito, S. Watanabe, R. Matsumoto, and T. Uyematsu, “Secret key agreement by reli- 
ability information of signals in Gaussian Maurer’s models,” in Proc. IEEE International 
Symposium on Information Theory, Toronto, Canada, July 2008, pp. 727-731. 

[55] J. Muramatsu, K. Yoshimura, and P. Davis, “Secret key capacity and advantage distillation 
capacity,” in Proc. IEEE International Symposium on Information Theory, Seattle, WA, 
USA, July 2006, pp. 2598-2602. 


314 


References 


[56] G. Brassard and L. Salvail, “Secret-key reconciliation by public discussion,” in Proc. 
Advances in Cryptology — Eurocrypt, T. Helleseth, Ed. Springer-Verlag, May 1993, 
pp. 411-423. 

[57] W. T. Buttler, S. K. Lamoreaux, J. R. Torgerson, G. H. Nickel, C. H. Donahue, and C. G. 
Peterson, “Fast, efficient error reconciliation for quantum cryptography,” Physical Review 
A, vol. 67, no. 5, pp. 052 303/1—8, May 2003. 

[58] D. Elkouss, A. Leverrier, R. Alléaume, and J. J. Boutros, “Efficient reconciliation protocol 
for discrete-variable quantum key distribution,” in Proc. IEEE International Symposium on 
Information Theory, Seoul, South Korea, July 2009, pp. 1879-1883. 

[59] G. Van Assche, J. Cardinal, and N. J. Cerf, “Reconciliation of a quantum-distributed Gaus- 
sian key,” IEEE Transactions on Information Theory, vol. 50, no. 2, pp. 394—400, February 
2004. 

[60] K.-C. Nguyen, G. Van Assche, and N. J. Cerf, “Side-information coding with turbo codes 
and its application to quantum key distribution,” in Proc. International Symposium on 
Information Theory and its Applications, Parma, Italy, October 2004, pp. 1274-1279. 

[61] M. Bloch, A. Thangaraj, S. W. McLaughlin, and J.-M. Merolla, “LDPC-based Gaussian 
key reconciliation,” in Proc. IEEE Information Theory Workshop, Punta del Este, Uruguay, 
March 2006, pp. 116-120; extended report available at arXiv:cs.IT/0509041. 

[62] C. Ye, A. Reznik, and Y. Shah, “Extracting secrecy from jointly Gaussian random variables,” 
in Proc. IEEE International Symposium on Information Theory, Seattle, WA, USA, July 
2006, pp. 2593-2597. 

[63] A. Rényi, “On measures of entropy and information,” in Proc. Fourth Berkeley Symposium 
on Mathematical Statistics and Probability, vol. 1, Berkeley, CA, USA, 1961, pp. 547-561. 

[64] C. H. Bennett, G. Brassard, C. Crépeau, and U. Maurer, “Generalized privacy amplification,” 
IEEE Transactions on Information Theory, vol. 41, no. 6, pp. 1915-1923, November 1995. 

[65] J. L. Carter and M. N. Wegman, “Universal classes of hash functions,” Journal of Computer 
and System Sciences, vol. 18, no. 2, pp. 143-154, April 1979. 

[66] M. N. Wegman and J. Carter, “New hash functions and their use in authentication and set 
equality,” Journal of Computer Sciences and Systems, vol. 22, no. 3, pp. 265-279, June 
1981. 

[67] D. R. Stinson, “Universal hashing and authentication codes,” Designs, Codes and Cryptog- 
raphy, vol. 4, no. 4, pp. 369-380, October 1994. 

[68] U. M. Maurer and S. Wolf, “Information-theoretic key agreement: From weak to strong 
secrecy for free,” in Advances in Cryptology — Eurocrypt 2000, B. Preneel, Ed. Springer- 
Verlag, 2000, p. 351. 

[69] S. Vadhan, “Extracting all the randomness from a weakly random source,’ Massachusetts 
Institute of Technology, Cambridge, MA, USA, Technical Report, 1998. 

[70] C. Cachin, “Entropy measures and unconditional security in cryptography.” Ph.D. disserta- 
tion, ETH Ziirich, Zurich, Switzerland, 1997. 

[71] C. Cachin and U. M. Maurer, “Linking information reconciliation and privacy amplifica- 
tion,” Journal of Cryptology, vol. 10, no. 2, pp. 97-110, March 1997. 

[72] J. Muramatsu, “Secret key agreement from correlated source outputs using low density par- 
ity check matrices,” IEICE Transactions on Fundamentals of Electronics, Communications 
and Computer Sciences, vol. E89-A, no. 7, pp. 2036-2046, July 2006. 

[73] S. Nitinawarat, “Secret key generation for correlated Gaussian sources,’ in Proc. 45th 
Allerton Conference on Communications, Control and Computing, Monticello, IL, USA, 
September 2007, pp. 1054-1058. 


References 315 


[74] U. Maurer and S. Wolf, “Secret-key agreement over unauthenticated public channels — part I. 
Definitions and a completeness result,’ IEEE Transactions on Information Theory, vol. 49, 
no. 4, pp. 822-831, April 2003. 

[75] U. Maurer and S. Wolf, “Secret-key agreement over unauthenticated public channels — 

part II. The simulatability condition,” IEEE Transactions on Information Theory, vol. 49, 

no. 4, pp. 832-838, April 2003. 

[76] U. Maurer and S. Wolf, “Secret-key agreement over unauthenticated public channels — 
part III. Privacy amplification,’ IEEE Transactions on Information Theory, vol. 49, no. 4, 
pp. 839-851, April 2003. 

[77] H. Imai, K. Kobara, and K. Morozov, “On the possibility of key agreement using variable 
directional antenna,” in Proc. Ist Joint Workshop on Information Security, Seoul, South 
Korea, September 2006, pp. 153-167. 

[78] C. Ye, S. Mathur, A. Reznik, Y. Shah, W. Trappe, and N. B. Mandayam, “Information- 
theoretically secret key generation for fading wireless channels,” JEEE Transactions on 
Information Forensics and Security, vol. 5, no. 2, pp. 240-254, June 2010. 

[79] C. Chen and M. A. Jensen, “Secret key establishment using temporally and spatially 
correlated wireless channel coefficients,’ JEEE Transactions on Mobile Computing, 
vol. 10, no. 2, pp. 205-215, February 2011. 

[80] F. Grosshans, G. Van Assche, J. Wenger, R. Brouri, N. J. Cerf, and P. Grangier, “Quantum 
key distribution using Gaussian-modulated coherent states,” Letters to Nature, vol. 421, no. 
6920, pp. 238-241, January 2003. 

[81] J. Lodewyck, M. Bloch, R. Garcia-Patron, S. Fossier, E. Karpov, E. Diamanti, T. Debuiss- 
chert, N. J. Cerf, R. Tualle-Brouri, S. W. McLaughlin, and P. Grangier, “Quantum key 
distribution over 25 km with an all-fiber continuous-variable system,” Physical Review A, 
vol. 76, pp. 042 305/1—10, October 2007. 

[82] H. Chabanne and G. Fumaroli, “Noisy cryptographic protocols for low-cost RFID tags,” 
IEEE Transactions on Information Theory, vol. 52, no. 8, pp. 3562-3566, August 
2006. 

[83] A. Khisti and G. Wornell, “The MIMOME channel,” in Proc. 45th Allerton Confer- 
ence on Communication, Control and Computing, Monticello, IL, USA, September 2007, 
pp. 625-632; available online: http://allegro.mit.edu/pubs/posted/journal/2008-khisti- 
wornell-it.pdf. 

[84] A. Khisti and G. Wornell, “Secure transmission with multiple antennas — II: The MIMOME 
wiretap channels,” IEEE Transactions on Information Theory, vol. 56, no. 11, pp. 5515- 
5532, November 2010. 

[85] Y. Liang, H. V. Poor, and S. Shamai (Shitz), “Secure communication over fading channels,” 
IEEE Transactions on Information Theory, vol. 54, no. 6, pp. 2470-2492, June 2008. 

[86] P. K. Gopala, L. Lai, and H. El Gamal, “On the secrecy capacity of fading channels,” IEEE 
Transactions on Information Theory, vol. 54, no. 10, pp. 4687-4698, October 2008. 

[87] S. K. Leung-Yan-Cheong and M. E. Hellman, “The Gaussian wire-tap channels,” IEEE 
Transactions on Information Theory, vol. 24, no. 4, pp. 451—456, July 1978. 

[88] R. Bustin, R. Liu, H. V. Poor, and S. Shamai, “An MMSE approach to the secrecy capacity 
of the MIMO Gaussian wiretap channels,” EURASIP Journal on Wireless Communications 
and Networking, vol. 2009, pp. 370 970/18, 2009. 

[89] F. Oggier and B. Hassibi, “The secrecy capacity of the MIMO wiretap channels,” in 
Proc. IEEE International Symposium on Information Theory, Toronto, Canada, July 2008, 
pp. 524-528. 


316 


References 


[90] T. Liu and S. Shamai, “A note on the secrecy capacity of the multiple-antenna wiretap 
channel,” IEEE Transactions on Information Theory, vol. 55, no. 6, pp. 2547-2553, June 
2009. 

[91] S. Shafiee, N. Liu, and S. Ulukus, “Towards the secrecy capacity of the Gaussian MIMO 
wire-tap channel: The 2-2-1 channels,” IEEE Transactions on Information Theory, vol. 55, 
no. 9, pp. 4033-4039, September 2009. 

[92] M. Gursoy, “Secure communication in the low-SNR regime: A characterization of the 
energy—secrecy tradeoff,’ in Proc. IEEE International Symposium on Information Theory, 
Seoul, South Korea, July 2009, pp. 2291-2295. 

[93] R. Negi and S. Goel, “Secret communication using artificial noise,” in 62nd Vehicular 
Technology Conference (VTC), vol. 3, Dallas, TX, USA, September 2005, pp. 1906- 
1910. 

[94] S. Goel and R. Negi, “Guaranteeing secrecy using artificial noise,” IEEE Transactions on 
Wireless Communications, vol. 7, no. 6, pp. 2180-2189, June 2008. 

[95] A. Khisti, G. Wornell, A. Wiesel, and Y. Eldar, “On the Gaussian MIMO wiretap channels,” 
in Proc. IEEE International Symposium on Information Theory, Nice, France, July 2007, 
pp. 2471-2475. 

[96] J. Barros and M. R. D. Rodrigues, “Secrecy capacity of wireless channels,” in Proc. IEEE 
International Symposium on Information Theory, Seattle, WA, USA, July 2006, pp. 356- 
360. 

[97] Z. Li, R. Yates, and W. Trappe, “Secret communication with a fading eavesdropper channel,” 
in Proc. IEEE International Symposium on Information Theory, Nice, France, June 2007, 
pp. 1296-1300. 

[98] H. Jeon, N. Kim, M. Kim, H. Lee, and J. Ha, “Secrecy capacity over correlated ergodic 
fading channels,” in Proc. IEEE Military Communications Conference, San Diego, CA, 
USA, November 2008, pp. 1-7. 

[99] O. O. Koyluoglu, H. El Gamal, L. Lai, and H. V. Poor, “On the secure degrees of freedom 
in the k-user Gaussian interference channels,” in Proc. IEEE International Symposium on 
Information Theory, Toronto, Canada, July 2008, pp. 384-388. 

[100] M. Kobayashi, M. Debbah, and S. S. (Shitz), “Secured communication over frequency- 
selective fading channels: A practical Vandermonde precoding,” EURASIP Journal on 
Wireless Communications and Networking, vol. 2009, pp. 386 547/1—-19, 2009. 

[101] M. Bloch and J. N. Laneman, “Information-spectrum methods for information-theoretic 
security,” in Proc. Information Theory and Applications Workshop, San Diego, CA, USA, 
February 2009, pp. 23-28. 

[102] X. Tang, R. Liu, P. Spasojevic, and H. V. Poor, “On the throughput of secure hybrid-ARQ 
protocols for Gaussian block-fading channels,” IEEE Transactions on Information Theory, 
vol. 55, no. 4, pp. 1575-1591, April 2009. 

[103] X. Tang, H. Poor, R. Liu, and P. Spasojevic, “Secret-key sharing based on layered broadcast 
coding over fading channels,” in Proc. IEEE International Symposium on Information 
Theory, Seoul, South Korea, July 2009, pp. 2762-2766. 

[104] Y. Liang, L. Lai, H. Poor, and S. Shamai, “The broadcast approach over fading Gaussian 
wiretap channels,” in Proc. Information Theory Workshop, Taormina, Italy, October 2009, 
pp. 1-5. 

[105] M. Bloch, J. Barros, M. R. D. Rodrigues, and S. W. McLaughlin, “Wireless information- 
theoretic security,” IEEE Transactions on Information Theory, vol. 54, no. 6, pp. 2515-2534, 
June 2008. 


References 317 


[106] T. F. Wong, M. Bloch, and J. M. Shea, “Secret sharing over fast-fading MIMO wiretap 
channels,” EURASIP Journal on Wireless Communications and Networking, vol. 2009, 
pp. 506 973/1-17, 2009. 

[107] R. Wilson, D. Tse, and R. A. Scholtz, “Channel identification: Secret sharing using reci- 
procity in ultrawideband channels,” IEEE Transactions on Information Forensics and Secu- 
rity, vol. 2, no. 3, pp. 364-375, September 2007. 

[108] T. Richardson and R. Urbanke, Modern Coding Theory. Cambridge University Press, 2008. 

[109] H. Jin and T. Richardson, “Block error iterative decoding capacity for LDPC codes,” 
in Proc. International Symposium on Information Theory, Adelaide, Australia, 2005, 
pp. 52-56. 

[110] R. G. Gallager, “Low density parity check codes,” Ph.D. dissertation, Massachusetts Institute 
of Technology, Cambridge, MA, USA, 1963. 

[111] T. J. Richardson and R. L. Urbanke, “The capacity of low-density parity-check codes under 
message-passing decoding,” IEEE Transactions on Information Theory, vol. 47, no. 2, 
pp. 599-618, February 2001. 

[112] S.-Y. Chung, T. J. Richardson, and R. L. Urbanke, “Analysis of sum—product decoding of 
low-density parity-check codes using a Gaussian approximation,” JEEE Transactions on 
Information Theory, vol. 47, no. 2, pp. 657—670, February 2001. 

[113] S.-Y. Chung, J. G. D. Forney, T. J. Richardson, and R. Urbanke, “On the design of low- 
density parity-check codes within 0.0045 dB of the Shannon limit,’ IEEE Communications 
Letters, vol. 5, no. 2, pp. 58—60, February 2001. 

[114] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacity-approaching 
irregular low-density parity-check codes,’ IEEE Transactions on Information Theory, 
vol. 47, no. 2, pp. 619-637, February 2001. 

[115] A. Shokrollahi and R. Storn, “Design of efficient erasure codes with differential evolu- 
tion,” in Proc. IEEE International Symposium on Information Theory, Sorrento, Italy, June 
2000. 

[116] L. H. Ozarow and A. D. Wyner, “Wire tap channel II,” AT&T Bell Laboratories Technical 
Journal, vol. 63, no. 10, pp. 2135-2157, December 1984. 

[117] A. Thangaraj, S. Dihidar, A. R. Calderbank, S. W. McLaughlin, and J.-M. Merolla, “Appli- 
cations of LDPC codes to the wiretap channels,” JEEE Transactions on Information Theory, 
vol. 53, no. 8, pp. 2933-2945, August 2007. 

[118] A. T. Suresh, A. Subramanian, A. Thangaraj, M. Bloch, and S. McLaughlin, “Strong secrecy 
for erasure wiretap channels,” in Proc. IEEE Information Theory Workshop, Dublin, Ireland, 
2010, pp. 1-5. 

[119] A. Subramanian, A. Thangaraj, M. Bloch, and S. W. McLaughlin, “Strong secrecy on 
the binary erasure wiretap channel using large-girth LDPC codes,” submitted to JEEE 
Transactions on Information Forensics and Security, September 2010. Available online: 
arXiv:1009.3130. 

[120] V. Rathi, M. Andersson, R. Thobaben, J. Kliewer, and M. Skoglund, “Two edge type LDPC 
codes for the wiretap channels,” in Proc. 43rd Asilomar Conference on Signals, Systems 
and Computers, Pacific Grove, CA, USA, November 2009, pp. 834-838. 

[121] G. Cohen and G. Zemor, “Syndrome-coding for the wiretap channel revisited,’ in Proc. 
IEEE Information Theory Workshop, Chengdu, China, October 2006, pp. 33-36. 

[122] R. Liu, Y. Liang, H. V. Poor, and P. Spasojević, “Secure nested codes for type II wiretap 
channels,” in Proc. IEEE Information Theory Workshop, Lake Tahoe, CA, USA, September 
2007, pp. 337-342. 


318 


References 


[123] E. Verriest and M. Hellman, “Convolutional encoding for Wyner’s wiretap channels,” IEEE 
Transactions on Information Theory, vol. 25, no. 2, pp. 234-236, March 1979. 

[124] V. Wei, “Generalized Hamming weights for linear codes,” IEEE Transactions on Information 
Theory, vol. 37, no. 5, pp. 1412-1418, September 1991. 

[125] H. Mahdavifar and A. Vardy, “Achieving the secrecy capacity of wiretap channels using 
polar codes,” in Proc. IEEE International Symposium on Information Theory, Austin, TX, 
USA, June 2010, pp. 913-917. Available online: arXiv:1001.0210v1. 

[126] M. Andersson, V. Rathi, R. Thobaben, J. Kliewer, and M. Skoglund, “Nested polar codes 
for wiretap and relay channels,” IEEE Communications Letters, vol. 14, no. 4, pp. 752-754, 
June 2010. 

[127] O. O. Koyluoglu and H. E. Gamal, “Polar coding for secure transmission and key agree- 
ment,” in Proc. IEEE International Symposium on Personal Indoor and Mobile Radio 
Communications, Istanbul, Turkey, 2010, pp. 2698-2703. 

[128] E. Hof and S. Shamai, “Secrecy-achieving polar-coding for binary-input memoryless sym- 
metric wire-tap channels,” in Proc. IEEE Information Theory Workshop, Dublin, Ireland, 
2010, pp. 1-5. 

[129] D. Klinc, J. Ha, S. McLaughlin, J. Barros, and B.-J. Kwak, “LDPC codes for the Gaussian 
wiretap channels,” in Proc. IEEE Information Theory Workshop, Taormina, Sicily, October 
2009, pp. 95-99. 

[130] J.-C. Belfiore and P. Solé, “Unimodular lattices for the Gaussian wiretap chan- 
nels,” in Proc. IEEE Information Theory Workshop, Dublin, Ireland, September 2010, 
pp. 1-5. 

[131] J.-C. Belfiore and F. Oggier, “Secrecy gain: A wiretap lattice code design,” in Proc. Interna- 
tional Symposium on Information Theory and its Applications, Taichung, Taiwan, October 
2010. 

[132] A. Wyner, “Recent results in the Shannon theory,’ IEEE Transactions on Information 
Theory, vol. 20, no. 1, pp. 2-10, January 1974. 

[133] A. D. Liveris, Z. Xiong, and C. N. Georghiades, “Compression of binary sources with 
side information at the decoder using LDPC codes,” JEEE Communications Letters, vol. 6, 
no. 10, pp. 440—442, October 2002. 

[134] J. Chen, D. He, and A. Jagmohan, “Slepian—Wolf code design via source-channel corre- 
spondence,” in Proc. IEEE International Symposium on Information Theory, Seattle, WA, 
USA, July 2006, pp. 2433-2437. 

[135] A. Leverrier, R. Alléaume, J. Boutros, G. Zémor, and P. Grangier, “Multidimensional 
reconciliation for a continuous-variable quantum key distribution,” Physical Review A, 
vol. 77, no. 4, p. 042325, April 2008. 

[136] U. Wachsmann, R. F. H. Fischer, and J. B. Huber, “Multilevel codes: Theoretical concepts 
and practical design rules,’ JEEE Transactions on Information Theory, vol. 45, no. 5, 
pp. 1361-1391, July 1999. 

[137] G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modulation,’ IEEE Transac- 
tions on Information Theory, vol. 44, no. 3, pp. 927-946, May 1998. 

[138] J. Hou, P. H. Siegel, L. B. Milstein, and H. D. Pfister, “Capacity-approaching bandwidth- 
efficient coded modulation schemes based on low-density parity-check codes,” IEEE Trans- 
actions on Information Theory, vol. 49, no. 9, pp. 2141-2155, September 2003. 

[139] W. Stallings, Cryptography and Network Security: Principles and Practice. Prentice Hall, 
2010. 


References 319 


[140] W. Diffie and M. Hellman, “New directions in cryptography,” IEEE Transactions on Infor- 
mation Theory, vol. 22, no. 6, pp. 644-654, November 1976. 

[141] R. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and 
public-key cryptosystems,’ Communications of the ACM, vol. 21, no. 2, pp. 120-126, 
February 1978. 

[142] B. Schneier, Applied Cryptography: Protocols, Algorithms, and Source Code in C, 2nd edn. 
Wiley, 1996. 

[143] A. Menezes, P. Van Oorschot, and S. Vanstone, Handbook of Applied Cryptography, 5th edn. 
CRC Press, 2001. 

[144] M. Médard, D. Marquis, R. Barry, and S. Finn, “Security issues in all-optical networks,” 
IEEE Network, vol. 11, no. 3, pp. 42-48, May-June 1997. 

[145] U. M. Maurer, “Authentication theory and hypothesis testing,’ IEEE Transactions on Infor- 
mation Theory, vol. 46, no. 4, pp. 1350-1356, July 2000. 

[146] Z. Li, W. Xu, R. Miller, and W. Trappe, “Securing wireless systems via lower layer 
enforcements,” in Proc. 5th ACM Workshop on Wireless Security, Los Angeles, CA, USA, 
September 2006, pp. 33—42. 

[147] Y. Liang, H. V. Poor, and S. S. (Shitz), Information-Theoretic Security. Now Publishers, 
2009. 

[148] R. Liu and W. Trappe, Eds., Securing Wireless Communications at the Physical Layer. 
Springer-Verlag, 2010. 

[149] E. Tekin and A. Yener, “The general Gaussian multiple-access and two-way wiretap 
channels: Achievable rates and cooperative jamming,” JEEE Transactions on Information 
Theory, vol. 54, no. 6, pp. 2735-2751, June 2008. 

[150] X. He and A. Yener, “On the role of feedback in two-way secure communications,” in Proc. 
42nd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 
October 2008, pp. 1093-1097. 

[151] M. Bloch, “Channel scrambling for secrecy,” in Proc. of IEEE International Symposium on 
Information Theory, Seoul, South Korea, July 2009, pp. 2452-2456. 

[152] M. Médard, “Capacity of correlated jamming channels,’ in Proc. Allerton Confer- 
ence on Communications, Computing and Control, Monticello, IL, USA, October 
1997. 

[153] X. Tang, R. Liu, P. Spasojević, and H. V. Poor, “Interference-assisted secret commu- 
nications,” in Proc. IEEE Information Theory Workshop, Porto, Portugal, May 2008, 
pp. 164-168. 

[154] X. Tang, R. Liu, P. Spasojevic, and H. V. Poor, “The Gaussian wiretap channel with a 
helping interferer,” in Proc. IEEE International Symposium on Information Theory, Toronto, 
Canada, July 2008, pp. 389-393. 

[155] X. He and A. Yener, “Providing secrecy with lattice codes,” in Proc. 46th Annual Allerton 
Conference on Communication, Control, and Computing, Monticello, IL, USA, September 
2008, pp. 1199-1206. 

[156] X. He and A. Yener, “Secure degrees of freedom for Gaussian channels with interference: 
Structured codes outperform Gaussian signaling,” in Proc. IEEE Global Telecommunica- 
tions Conference, Honolulu, HI, USA, December 2009, pp. 1—6. 

[157] E. Ekrem and S. Ulukus, “Secrecy in cooperative relay broadcast channels,” in Proc. IEEE 
International Symposium on Information Theory, Toronto, Canada, July 2008, pp. 2217- 
2221. 


320 


References 


[158] M. Bloch and A. Thangaraj, “Confidential messages to a cooperative relay,” in Proc. IEEE 
Information Theory Workshop, Porto, Portugal, May 2008, pp. 154-158. 

[159] J. P. Vilela, M. Bloch, J. Barros, and S. W. McLaughlin, “Friendly jamming for wireless 
secrecy,” in Proc. IEEE International Conference on Communications, Cape Town, South 
Africa, May 2010, pp. 1550-3607. 

[160] O. Simeone and A. Yener, “The cognitive multiple access wire-tap channel,” in Proc. 43rd 
Annual Conference on Information Sciences and Systems, Baltimore, MD, USA, March 
2009, pp. 158-163. 

[161] Y. Liang, A. Somekh-Baruch, H. V. Poor, S. Shamai, and S. Verdú, “Capacity of cognitive 
interference channels with and without secrecy,” JEEE Transactions on Information Theory, 
vol. 55, no. 2, pp. 604-619, February 2009. 

[162] C. Mitrpant, A. J. H. Vinck, and Y. Luo, “An achievable region for the Gaussian wiretap 
channel with side information,’ IEEE Transactions on Information Theory, vol. 52, no. 5, 
pp. 2181-2190, May 2006. 

[163] G. T. Amariucai and S. Wei, “Secrecy rates of binary wiretapper channels using feed- 
back schemes,” in Proc. 42nd Annual Conference on Information Sciences and Systems, 
Princeton, NJ, USA, March 2008, pp. 624-629. 

[164] L. Lai, H. El Gamal, and H. V. Poor, “The wiretap channel with feedback: Encryption over 
the channel,” IEEE Transactions on Information Theory, vol. 54, no. 11, pp. 5059-5067, 
November 2008. 

[165] D. Gündüz, D. R. Brown, and H. V. Poor, “Secret communication with feedback,” in Proc. 
International Symposium on Information Theory and Its Applications, Auckland, New 
Zealand, May 2008, pp. 1-6. 

[166] A. El Gamal, O. O. Koyluoglu, M. Youssef, and H. El Gamal, “New achievable secrecy rate 
regions for the two way wiretap channels,” in Proc. IEEE Information Theory Workshop, 
Cairo, Egypt, January 2010, pp. 1-5. 

[167] X. He and A. Yener, “A new outer bound for the secrecy capacity region of the Gaussian 
two-way wiretap channels,” in Proc. IEEE International Conference on Communications, 
Cape Town, South Africa, May 2010, pp. 1-5. 

[168] Y. Oohama, “Coding for relay channels with confidential messages,” in Proc. IEEE Infor- 
mation Theory Workshop, Cairns, Australia, September 2001, pp. 87-89. 

[169] L. Lai and H. El Gamal, “The relay-eavesdropper channel: Cooperation for secrecy,” IEEE 
Transactions on Information Theory, vol. 54, no. 9, pp. 4005—4019, September 2008. 

[170] M. Yuksel and E. Erkip, “Secure communication with a relay helping the wire-tapper,” 
in Proc. IEEE Information Theory Workshop, Lake Tahoe, CA, USA, September 2007, 
pp. 595-600. 

[171] X. He and A. Yener, “Secure communication with a Byzantine relay,’ in Proc. IEEE 
International Symposium on Information Theory, Seoul, South Korea, July 2009, pp. 2096— 
2100. 

[172] X. He and A. Yener, “Two-hop secure communication using an untrusted relay,” EURASIP 
Journal on Wireless Communication and Networking, vol. 2009, pp. 305 146/1—13, 2009. 

[173] X. He and A. Yener, “Cooperation with an untrusted relay: A secrecy perspective,” JEEE 
Transactions on Information Theory, vol. 56, no. 8, pp. 3807—3827, August 2010. 

[174] A. Khisti, A. Tchamkerten, and G. W. Wornell, “Secure broadcasting over fading channels,” 
IEEE Transactions on Information Theory, vol. 54, no. 6, pp. 2453-2469, June 2008. 

[175] Y. Liang, G. Kramer, H. V. Poor, and S. S. (Shitz), “Compound wiretap channels,” EURASIP 
Journal on Wireless Communications and Networking, vol. 2009, pp. 142 374/1-12, 2009. 


References 321 


[176] I. Csiszar and P. Narayan, “Secrecy capacities for multiple terminals,” JEEE Transactions 
on Information Theory, vol. 50, no. 12, pp. 3047-3061, December 2004. 

[177] I. Csiszár and P. Narayan, “Secrecy capacities for multiterminal channel models,” IEEE 
Transactions on Information Theory, vol. 54, no. 6, pp. 2437-2452, June 2008. 

[178] I. Csiszár and P. Narayan, “Secrecy generation for multiple input multiple output channel 
models,” in Proc. IEEE International Symosium on Information Theory, Seoul, South Korea, 
July 2009, pp. 2447-2451. 

[179] S. Nitinawarat, A. Barg, P. Narayan, C. Ye, and A. Reznik, “Perfect secrecy, perfect omni- 
science and Steiner tree packing,” in Proc. IEEE International Symposium on Information 
Theory, Seoul, South Korea, July 2009, pp. 1288-1292. 

[180] S. Nitinawarat and P. Narayan, “Perfect secrecy and combinatorial tree packing,” in Proc. 
IEEE International Symposium on Information Theory, Austin, TX, USA, June 2010, 
pp. 2622-2626. 

[181] M. Haenggi, “The secrecy graph and some of its properties,” in Proc. IEEE International 
Symposium on Information Theory, Toronto, Canada, July 2008, pp. 539-543. 

[182] P. C. Pinto, J. Barros, and M. Z. Win, “Physical-layer security in stochastic wireless net- 
works,” in Proc. 11th IEEE Singapore International Conference on Communication Sys- 
tems, Guangzhou, Singapore, November 2008, pp. 974-979. 

[183] E. Perron, S. Diggavi, and E. Telatar, “On noise insertion strategies for wireless network 
secrecy,” in Proc. Information Theory and Applications Workshop, San Diego, CA, USA, 
February 2009, pp. 77-84. 

[184] R. Ahlswede, N. Cai, S. Li, and R. Yeung, “Network information flow,” IEEE Transactions 
on Information Theory, vol. 46, no. 4, pp. 1204-1216, July 2000. 

[185] S. Li, R. Yeung, and N. Cai, “Linear network coding,” JEEE Transactions on Information 
Theory, vol. 49, no. 2, pp. 371-381, February 2003. 

[186] R. Koetter and M. Médard, “An algebraic approach to network coding,” JEEE/ACM Trans- 
actions on Networking, vol. 11, no. 5, pp. 782-795, October 2003. 

[187] T. Ho, M. Médard, R. Koetter, D. Karger, M. Effros, J. Shi, and B. Leong, “A random linear 
network coding approach to multicast,” JEEE Transactions on Information Theory, vol. 52, 
no. 10, pp. 4413-4430, October 2006. 

[188] P. Chou, Y. Wu, and K. Jain, “Practical network coding,” in Proc. 41st Allerton Conference 
on Communication, Control, and Computing, Monticello, IL, USA, October 2003. 

[189] C. Gkantsidis and P. Rodriguez, “Network coding for large scale content distribution,” in 
Proc. 24th Annual Joint Conference of the IEEE Computer and Communications Societies 
(INFOCOM), Miami, FL, USA, March 2005, pp. 2235-2245. 

[190] S. Katti, H. Rahul, W. Hu, D. Katabi, M. Médard, and J. Crowcroft, “XORs in the air: 
practical wireless network coding,” in Proc. Conference on Applications, Technologies, 
Architectures, and Protocols for Computer Communications, Pisa, Italy, August 2006, 
pp. 243-254. 

[191] N. Cai and R. Yeung, “Secure network coding,” in Proc. IEEE International Symposium on 
Information Theory, Lausanne, Switzerland, July 2002, p. 323. 

[192] L. Ozarow and A. Wyner, “Wire-tap channel II,” AT&T Bell Labs Technical Journal, vol. 63, 
no. 10, pp. 2135-2157, December 1984. 

[193] S. Rouayheb and E. Soljanin, “On wiretap networks II,” in Proc. IEEE International 
Symposium on Information Theory, Nice, France, June 2007, pp. 551-555. 

[194] K. Bhattad and K. Narayanan, “Weakly secure network coding,” in Proc. First Workshop on 
Network Coding, Theory, and Applications (NetCod), Riva del Garda, Italy, February 2005. 


322 


References 


[195] L. Lima, M. Médard, and J. Barros, “Random linear network coding: A free cipher?” in 
Proc. IEEE International Symposium on Information Theory, Nice, France, June 2007, 
pp. 546-550. 

[196] J. Vilela, L. Lima, and J. Barros, “Lightweight security for network coding,” in Proc. IEEE 
International Conference on Communications, Beijing, China, May 2008, pp. 1750-1754. 

[197] L. Lima, S. Gheorghiu, J. Barros, M. Médard, and A. Toledo, “Secure network coding for 
multi-resolution wireless video streaming,” IEEE Journal on Selected Areas in Communi- 
cations, vol. 28, no. 3, pp. 377-388, April 2010. 

[198] S. Jaggi, M. Langberg, S. Katti, T. Ho, D. Katabi, and M. Médard, “Resilient network coding 
in the presence of Byzantine adversaries,” in JEEE INFOCOM, May 2007. 

[199] C. Gkantsidis and P. Rodriguez, “Cooperative security for network coding file distribution,” 
in JEEE INFOCOM, Barcelona, Spain, April 2006. 

[200] N. Cai and R. Yeung, “Network error correction,” in Proc. IEEE International Symposium 
on Information Theory, Kanagawa, Japan, July 2003, p. 101. 

[201] R. Koetter and F. Kschischang, “Coding for errors and erasures in random network coding,” 
IEEE Transactions on Information Theory, vol. 54, no. 8, pp. 3579-3591, August 2008. 

[202] M. Kim, L. Lima, F. Zhao, J. Barros, M. Médard, R. Koetter, T. Kalker, and K. Han, “On 
counteracting Byzantine attacks in network coded peer-to-peer networks,” IEEE Journal 
on Selected Areas in Communications, vol. 28, no. 5, pp. 692-702, June 2010. 

[203] P. F. Oliveira, R. A. Costa, and J. Barros, “Mobile secret key distribution with network 
coding,” in Proc. International Conference on Security and Cryptography, Barcelona, Spain, 
July 2007, pp. 1—4. 

[204] M. J. Kim, J. Barros, M. Médard, and R. Koetter, “An algebraic watchdog for wireless 
network coding,” in Proc. IEEE International Symposium on Information Theory, Seoul, 
South Korea, July 2009, pp. 1159-1163. 

[205] T. Ho and D. Lun, Network Coding: An Introduction. Cambridge University Press, 2008. 

[206] C. Fragouli and E. Soljanin, “Network coding fundamentals,” Foundations and Trends in 
Networking, vol. 2, no. 1, pp. 1-133, 2007. 

[207] C. Fragouli and E. Soljanin, “Network coding applications,” Foundations and Trends in 
Networking, vol. 2, no. 2, pp. 135-269, 2007. 


Author index 


Adleman, L., 265 

Ahlswede, R., 8, 38, 110, 119, 162, 174, 309 
Alléaume, R., 175, 246 

Amariucai, G., 291 

Anantharam, V., 174 

Andersson, M., 245 

Ardestanizadeh, E., 108, 110, 291 


Barg, A., 292 

Barros, J., 208, 211, 246, 265, 291, 292, 
310 

Barry, R., 265 

Belfiore, J.-C., 246 

Bennett, C., 152, 175 

Bergmans, P., 42 

Bhattad, K., 310 

Bloch, M., 110, 175, 176, 211, 245, 246, 
265, 277, 291, 292 

Boutros, J., 175, 246 

Brassard, G., 175, 246 

Bustin, R., 210 

Buttler, W., 175 


Cachin, C., 155, 176 

Cai, N., 110, 309 

Calderbank, R., 245 

Cardinal, J., 175, 246 

Carter, J., 152, 175, 265 

Cerf, N., 175, 176, 246 

Chabanne, H., 176 

Chen, C., 176 

Chen, J., 246 

Chou, P., 310 

Chuang, I., 174 

Chung, S.-Y., 245 

Cohen, G., 245 

Cover, T., 13, 45 

Crépeau, C., 175 

Csiszar, I., 8, 13, 80, 110, 119, 162, 
174-176, 292 


Davis, P, 175 
Debbah, M., 211 


Debuisschert, T., 176 
Diamanti, E., 176 
Diffie, W., 7, 265 
Diggavi, S., 175, 292 
Dihidar, S., 245 
Donahue, C., 175 
Dunn, B., 110 


Ekrem, E., 291 

Eldar, Y., 210 

Elkouss, D., 175, 246 

El Gamal, A., 291 

El Gamal, H., 211, 291, 292 
Erkip, E., 292 


Feder, M., 110 

Finn, S., 265 

Forney, D., 53, 110, 245 
Fossier, S., 176 

Fragouli, C., 310 
Franceschetti, M., 110, 291 
Fumaroli, G., 176 


Gündüz, D., 291 
Gallager, R., 13, 42, 45, 245 
Gander, M., 175 
Garcia-Patrón, R., 176 
Georghiades, C., 246 
Gisin, N., 174 
Gkantsidis, C., 310 
Goel, S., 210 

Gohari, A., 174 
Gopala, P., 204, 211 
Gowtham, K., 111 
Grangier, P., 176, 246 
Grosshans, F., 176 
Gursoy, M., 210 


Ha, J., 246 

Haenggi, M., 292 
Hassibi, B., 186, 210 
Hayashi, M., 110 
He, D., 246 


324 Author index 


He, X., 289, 291, 292 
Hellman, M., 7, 184, 210, 245, 265 


Maurer, U., 8, 119, 139, 160, 174-176, 265 
McLaughlin, S., 175, 176, 211, 245, 246, 265, 


Ho, T., 310 
Imai, I., 176 


Jaggi, S., 310 
Jagmohan, A., 246 
Jain, K., 310 
Javidi, T., 110, 291 
Jensen, A., 176 
Jin, H., 223, 245 


Körner, J., 13, 80, 110 
Karpov, E., 176 

Katti, S., 310 

Khisti, A., 175, 186, 187, 210, 291, 292 
Kim, M., 310 

Kim, Y.-H., 110, 291 
Kliewer, J., 245 

Klinc, D., 246 

Kobara, K., 176 
Kobayashi, M., 211 
Koetter, R., 310 
Koyluoglu, O., 211, 291 
Kramer, G., 45, 292 
Kschischang, F., 310 
Kwak, B.-J., 246 


Lai, L., 211, 291, 292 

Lamoreaux, S., 175 

Laneman, J. N., 110, 211, 292 
Leung-Yan-Cheong, S., 63, 184, 210 
Leverrier, A., 175, 246 

Li, B., 309 

Li, Z., 211, 265 


Liang, Y., 178, 195, 210, 211, 245, 291, 292 


Liao, H., 38 

Lima, L., 310 

Liu, N., 210 

Liu, R., 210, 211, 245, 291 
Liu, S., 175 

Liu, T., 186 

Liveris, A., 246 
Lodewyck, J., 176 

Lun, D., 310 

Luo, Y., 291 


Médard, M., 265, 291, 310 
Mandayam, N., 176 
Marquis, D., 265 

Marton, K., 110 

Massey, J., 110 

Mathur, S., 176 
Matsumoto, R., 175 
Matsushima, T., 111 


291 
Menezes, A., 265 
Merhay, N., 110 
Merolla, J.-M., 175, 245, 246 
Miller, R., 265 
Mitrpant, C., 291 
Morozov, K., 176 
Muramatsu, J., 138, 175, 176 


Nair, C., 87, 110 

Naito, M., 175 

Narayan, P., 175, 292 
Narayanan, K. R., 310 
Negi, R., 210 

Nguyen, K.-C., 175, 246 
Nickel, G., 175 

Nielsen, M., 174 
Nitinawarat, S., 176, 292 


Oggier, F., 186, 210, 246 
Oliveira, P., 310 
Oohama, Y., 175, 292 
Ozarow, L., 226, 245 
Ozarow, L. H., 310 


Perron, E., 292 

Peterson, C., 175 

Pinto, P., 292 

Poor, H. V., 210, 211, 245, 291, 292 
Prabhakaran, V., 175 


Rényi, A., 175 
Ramchandran, K., 175 
Rathi, V., 245 

Renner, R., 133, 174 
Reznik, A., 175, 176, 246, 292 
Ribordy, G., 174 
Richardson, T., 223, 245 
Rivest, R., 265 

Rodrigues, M., 208, 211, 265 
Rodriguez, P. R., 310 
Rouayheb, S. Y. E., 310 
Ruoheng, L., 291 


Salvail, L., 175 

Schneier, B., 265 

Shafiee, S., 210 

Shah, Y., 175, 176, 246 

Shamai, S., 186, 210, 211, 291, 292 
Shamir, A., 265 

Shannon, C., 4, 23, 49, 110, 265 
Shea, J., 211 

Sholtz, R., 211 

Simeone, O., 291 


Skoglund, M., 245 

Solé, P., 246 

Soljanin, E., 310 
Somekh-Baruch, A., 291 
Spasojević, P., 211, 245, 291 
Stallings, W., 265 

Stinson, D., 175 
Subramanian, A., 245 

Suko, T., 111 

Suresh, A., 245 


Tang, X., 211, 291 
Tchamkerten, A., 292 

Tekin, E., 284, 291 

Telatar, E., 292 

Thangaraj, A., 111, 175, 216, 228, 245, 246, 291 
Thobaben, R., 245 

Thomas, J., 13 

Tittel, W., 174 

Torgerson, J., 175 

Trappe, W., 176, 211, 265, 291 
Tse, D., 211 

Tualle-Brouri, R., 176 


Ulukus, S., 210, 291 
Urbanke, R., 245 
Uyematsu, T., 175 


Vadhan, S., 159, 176 

Van Oorschot, P., 265 

Vanstone, S., 265 

Van Assche, G., 158, 174-176, 246 
Van Dijk, M., 86, 89, 110, 175 

Van Tilborg, H., 175 

Verdú, S., 291 


Author index 325 


Vernam, G., 110 
Verriest, E., 245 
Vilela, J., 291 
Vilela, J. P, 310 
Vinck, A., 291 


Watanabe, S., 175 

Wegman, M., 152, 175, 265 

Wei, S., 291 

Wei, V., 245 

Wenger, J., 176 

Wiesel, A., 210 

Wilson, R., 211 

Win, M., 292 

Wolf, S., 133, 160, 174-176 

Wong, T., 211 

Wornell, G., 175, 186, 187, 210, 291, 292 
Wu, Y., 310 

Wyner, A., 6, 58, 61, 110, 226, 245, 310 


Xiong, Z., 246 
Xu, W., 265 


Yamamoto, H., 106, 110 
Yasui, K., 111 

Yates, R., 211 

Ye, C., 175, 176, 246, 292 
Yener, A., 284, 289, 291, 292 
Yeung, R., 13, 309 
Yoshimura, K., 175 

Youssef, M., 291 

Yuksel, M., 292 


Zémor, G., 245, 246 
Zbinden, H., 174 


Subject index 


Note: Page numbers in bold refer to figures and tables. 


achievability proof, 26 
channel coding, 30 
channel model, 162 
coded cooperative jamming, 285 
cooperative jamming, 276 
distributed source coding, 35 
Gaussian broadcast channel with confidential 
messages, 179 
Gaussian source model, 191 
source coding, 27 
achievable rate, 25, 26, 34, 38, 42, 60, 79, 116, 118, 
137, 145, 272 
advantage distillation, 136-143 
capacity, 137, 138 
protocol, 136 
rate, 137 
AEP, see asymptotic equipartition 
property 
AES, 248 
application layer, 254 
asymptotic equipartition property 
conditional, 20 
joint, 19 
strong, 19 
weak, 21 
weak joint AEP, 22 
authentication, 251, 252, 264 
unconditional, 252 
AWGN, see Gaussian channel 


base of logarithm, 15 

BEC, see binary erasure channel 

binary erasure channel, 7, 25, 32, 53, 65, 84, 85, 90, 
222, 223, 228, 229 

binary symmetric channel, 25, 32, 62, 65, 84, 85, 
90, 136, 222, 230, 233, 244 

binning, 92 

block cipher, 248 

broadcast channel with confidential messages, 
78-103 

Gaussian, 177—190 
broadcast channel, 40-44 


brute-force attack, 249 
BSC, see binary symmetric channel 


chain rule 
entropy, 16 
mutual information, 16 
channel 
broadcast, see broadcast channel 
code, see code 
discrete memoryless, see 
discrete memoryless channel 
Gaussian, see Gaussian channel 
less capable, see less capable channel 
multiple-access, see multiple-access channel 
noisier, see noisier channel 
ordered, 222 
physically degraded, see 
physically degraded channel 
stochastically degraded, see 
stochastically degraded channel, 229 
two-way wiretap, see two-way wiretap channel 
weakly symmetric, 63, 63, 65, 106 
wiretap, see wiretap channel 
channel capacity 
Gaussian channel, 32 
channel coding theorem, 29 
channel capacity, 26 
channel estimation, 264 
Chebyshev’s inequality, 14, 170, 171, 228 
Chernov bounds, 14 
code, 26, 37, 41, 59, 68, 79, 91, 105, 272 
linear, see linear code 
codebook, 26 
coded cooperative jamming, 273, 283-289 
coherence interval, 194, 194, 200, 203, 207 
computational security, 248-251 
concave, 17 
conditional mutual information, 16 
conditioning does not increase entropy, 16 
confidentiality, 251 
converse proof, 26 
broadcast channel with confidential messages, 98 


channel coding, 31 

channel model, 163 
cooperative jamming, 277 
degraded wiretap channel, 76 
distributed source coding, 36 


Gaussian broadcast channel with confidential 


messages, 179 
multiple-access channel, 40 
source coding, 28 
source model, 127 


wiretap channel with rate-limited feedback, 166 


convex, 17 
convex hull, 38, 75, 96 
cooperative jamming, 10, 273, 275-283 
coset coding, 224 
with dual of LDPC codes, 228-229 
coset code, 218, 224 
coset coding, 225 
cross-over probability, 25 
crypto lemma, 53, 56, 107, 120, 141 


data-processing inequality, 17, 32, 60, 76, 77, 85, 


128, 130 

degraded wiretap channel, 58 

example, 65 
digital signature, 252 
discrete memoryless channel, 25 
discrete memoryless source, 25, 33, 59 
distributed source coding, 33 
DMC, see discrete memoryless channel 
DMS, see discrete memoryless source 
dual code, 218 
dummy message, 69, 92, 103, 286 


eavesdropper’s channel, 58, 79 
entropy 

binary entropy function, 15 

chain rule, 16 

collision entropy, 149 

conditional entropy, 15 

differential entropy, 18 

joint entropy, 15 

min-entropy, 150 

Rényi entropy, 151 

Shannon entropy, 15 
entropy—power inequality, 18, 181 
equivocation, 5, 50, 59, 79 
erasure probability, 25 
exponential-integral function, 202, 206, 

208 

extractor, 159, 160 


fading 

block fading, 194, 203-206 

ergodic fading, 194—203 

quasi-static fading, 194, 206-210, 244 
fading coefficient, 194 


Subject index 


fading gain, 194 


327 


Fano’s inequality, 17, 29, 31, 51, 69, 76, 92, 98, 123, 
127, 132, 138, 145, 163, 169, 205, 277, 286 


feedback 

rate-limited, 105 

secure, 105 
full secrecy, see secrecy condition 
functional dependence graph, 22, 102 


generator matrix, 218 
GPRS, 258 
GSM, 258 


hash functions, 250 


IETF, 255 

integrity, 251 

intrinsic conditional information, 130 
reduced, 133 

IP, 247 

IPSec, 256 


Jensen’s inequality, 17, 150, 153, 180, 181 
joint AEP, 39, 44 


kernel, 187, 226 
key-distillation strategy, 113, 115, 278 


LDPC, see low-density parity-check code 
leakage, 50, 59, 92, 116, 195, 272 
less capable channel, 87, 88, 90 
linear code, 218, 224 
coset, see coset code 
dual, see dual code 
syndrome, see syndrome 
link layer, 256 


local randomness, 59, 67, 68, 79, 91, 103, 105, 113, 


115, 136, 144, 272, 278, 285 
low-density parity-check codes, 217-223 
message-passing, 220-222, 232-233 

threshold, 222 
LTE, 259 


main channel, 58, 79 
Markov chain, 17 
Markov’s inequality, 13, 160, 173 
multiple-access channel, 37—40, 284, 285 
mutual information, 16 

chain rule, 16 


near-field communication, 260 
network coding, 293 
Byzantine attacks, 306 
linear, 296 
passive attacks, 303 
protocols, 299 
vulnerabilities, 302 


328 


Subject index 


network error correction, 306 
network information theory, 33 
network layer, 255 

NIST, 250 

noisier channel, 86, 88—90, 104, 106 


one-time pad, 5, 52, 249, 273, 289, 290 

one-way communication, 116, 119, 121, 145, 
154 

optical communication, 257 

OSI Reference Model, 253 

outage probability, 208 


parity-check matrix, 218 
perfect secrecy, see secrecy condition 
physically degraded channel, 84, 88, 108, 165 
privacy amplification, 148-162, 243 

extractor, see extractor 

universal hash function, see 

universal hash functions 

public-key cryptography, 249 


quantization, 127, 147 


random binning, 27, 35, 123 

random coding, 30, 39, 42, 65, 68, 90, 91, 
285 

rate—-equivocation region 


broadcast channel with confidential messages, 79, 


80 
degraded wiretap channel, 60, 61 
wiretap channel, 80, 81 
reconciliation, 143—148, 231-242 
binary source, 231-234 
capacity, 145 
continuous random variables, 147 
direct reconciliation, 147, 155 
efficiency, 147 
Gaussian source, 239—242 
multilevel reconciliation, 235-239 
protocol, 144, 154, 161, 232, 235, 243 
rate, 145 
reverse reconciliation, 147, 157 
RFID, 259 
RSA, 249 


secrecy-capacity region 


broadcast channel with confidential messages, 


81 
Gaussian broadcast channel with confidential 
messages, 179 
secrecy capacity, 6 
Gaussian wiretap channel, 185 
wireless channel, 196, 204, 208 
wiretap channel, 60, 62, 168 


secrecy condition 
full secrecy, 60, 61, 65, 68, 70, 80, 91 
perfect secrecy, 3, 5, 49-53, 55, 60, 117, 227, 305 
strong secrecy, 55, 166 
weak secrecy, 55, 166 
secret-key agreement, 112—176 
channel model, 114, 278 
Gaussian source model, 190 
source model, 113 
secret-key agreement, 262 
secret-key capacity 
channel model, 162, 166, 277 
source model, 118, 119, 138, 158 
secure channel codes, 261 
security services, 248, 251 
selection lemma, 14, 27—29, 31, 36, 72, 94, 126, 288 
Shannon’s cipher system, 4, 49 
side information, 37 
Slepian—Wolf code, 34 
Slepian—Wolf region, 34 
source 
code, see source code 
discrete memoryless, see 
discrete memoryless source 
source coding, 27 
source code, 25, 34 
source coding theorem, 26 
source coding with side information, 37 
spread spectrum, 257 
SSL, 255 
stochastically degraded channel, 84, 85, 88, 90, 178, 
196 
stochastic encoder, 59 
strategy, see key-distillation strategy 
stream cipher, 249 
strong secrecy, see secrecy condition 
strong secrecy capacity 
wiretap channel, 61, 168 
strong secret-key capacity 
channel model, 162, 166 
source model, 118, 158 
superposition coding, 42, 66, 91-93 
symmetric encryption, 248 
syndrome, 218, 232 


Tanner graph, 218, 219 
TCP, 247 
time-sharing, 40, 276, 282, 290 
transport layer, 255 
two-way communication, 116 
two-way wiretap channel, 105, 270-275 
Gaussian, 270 
typical set 
consistency, 19 
jointly typical set, 19 


typical set 
jointly weak typical set, 21 
strong typical set, 18 
weak typical set, 21 


UMTS, 259 
universal families of hash functions, 152, 252 


variational distance, 15, 74, 159, 160 
Vernam’s cipher, 5, 52 


Subject index 


weak secrecy, see secrecy condition 
wiretap channel, 6, 49 
binary erasure, 54, 90, 223-231 
binary symmetric, 90, 244 
complex Gaussian, 185 
degraded, 58 
type II, 227, 305 
wiretap code, 6 


X.800, 253 


329 


